[JBoss JIRA] (ISPN-4567) InfinispanLuceneDirectoryIT random failure: LifecycleException: The server is already running
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-4567?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-4567:
--------------------------------
Fix Version/s: 7.0.0.CR1
(was: 7.0.0.Beta2)
> InfinispanLuceneDirectoryIT random failure: LifecycleException: The server is already running
> ---------------------------------------------------------------------------------------------
>
> Key: ISPN-4567
> URL: https://issues.jboss.org/browse/ISPN-4567
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Server
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Assignee: Galder Zamarreño
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.0.0.CR1
>
>
> {{ManagedDeployableContainer}} detects a running container, even though the previous module killed its server:
> {noformat}
> [10:25:05] : [org.infinispan:infinispan-as-module-client-integrationtests] kill_server:
> [10:25:06] : [org.infinispan:infinispan-as-module-client-integrationtests] [echo] Killing Infinispan server with PID - 3658 29739
> [10:25:06] : [org.infinispan:infinispan-as-module-client-integrationtests] [delete] Deleting: /mnt/ebs/TeamCity/buildAgent/work/master/integrationtests/as-integration-client/jps.pid
> [10:25:20] : [org.infinispan:infinispan-as-lucene-integration] org.infinispan.test.integration.as.InfinispanLuceneDirectoryIT Time elapsed: 3.285 sec <<< ERROR!
> [10:25:20] : [org.infinispan:infinispan-as-lucene-integration] org.jboss.arquillian.container.spi.client.container.LifecycleException: The server is already running! Managed containers do not support connecting to running server instances due to the possible harmful effect of connecting to the wrong server. Please stop server before running or change to another type of container.
> [10:25:20] : [org.infinispan:infinispan-as-lucene-integration] To disable this check and allow Arquillian to connect to a running server, set allowConnectingToRunningServer to true in the container configuration
> [10:25:20] : [org.infinispan:infinispan-as-lucene-integration] at org.jboss.as.arquillian.container.managed.ManagedDeployableContainer.failDueToRunning(ManagedDeployableContainer.java:358)
> [10:25:20] : [org.infinispan:infinispan-as-lucene-integration] at org.jboss.as.arquillian.container.managed.ManagedDeployableContainer.startInternal(ManagedDeployableContainer.java:88)
> [10:25:20] : [org.infinispan:infinispan-as-lucene-integration] at org.jboss.as.arquillian.container.CommonDeployableContainer.start(CommonDeployableContainer.java:112)
> [10:25:20] : [org.infinispan:infinispan-as-lucene-integration] at org.jboss.arquillian.container.impl.ContainerImpl.start(ContainerImpl.java:199)
> [10:25:20] : [org.infinispan:infinispan-as-lucene-integration] at org.jboss.arquillian.container.impl.client.container.ContainerLifecycleController$8.perform(ContainerLifecycleController.java:163)
> [10:25:20] : [org.infinispan:infinispan-as-lucene-integration] at org.jboss.arquillian.container.impl.client.container.ContainerLifecycleController$8.perform(ContainerLifecycleController.java:157)
> [10:25:20] : [org.infinispan:infinispan-as-lucene-integration] at org.jboss.arquillian.container.impl.client.container.ContainerLifecycleController.forContainer(ContainerLifecycleController.java:255)
> [10:25:20] : [org.infinispan:infinispan-as-lucene-integration] at org.jboss.arquillian.container.impl.client.container.ContainerLifecycleController.startContainer(ContainerLifecycleController.java:156)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4566) ManualIndexingTest.testManualIndexing random failures
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-4566?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-4566:
--------------------------------
Fix Version/s: 7.0.0.CR1
(was: 7.0.0.Beta2)
> ManualIndexingTest.testManualIndexing random failures
> -----------------------------------------------------
>
> Key: ISPN-4566
> URL: https://issues.jboss.org/browse/ISPN-4566
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Query
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Assignee: Sanne Grinovero
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.0.0.CR1
>
>
> Random timeouts when TRACE logging is enabled:
> {noformat}
> 04:58:33,679 ERROR (testng-ManualIndexingTest:) [UnitTestTestNGListener] Test testManualIndexing(org.infinispan.query.api.ManualIndexingTest) failed.
> org.infinispan.commons.CacheException: java.util.concurrent.ExecutionException: Map phase executing at ManualIndexingTest-NodeA-44176 did not complete within 20 sec timeout
> at org.infinispan.distexec.mapreduce.MapReduceTask.executeHelper(MapReduceTask.java:506)
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:407)
> at org.infinispan.query.impl.massindex.MapReduceMassIndexer.start(MapReduceMassIndexer.java:25)
> at org.infinispan.query.api.ManualIndexingTest.testManualIndexing(ManualIndexingTest.java:52)
> {noformat}
> Trace log here: http://ci.infinispan.org/viewLog.html?buildId=9816&buildTypeId=Infinispan...
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4572) StateTransferReplicationQueueTest.testStateTransferWithNodeRestartedAndBusyNonTx random failures
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-4572?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-4572:
--------------------------------
Fix Version/s: 7.0.0.CR1
(was: 7.0.0.Beta2)
> StateTransferReplicationQueueTest.testStateTransferWithNodeRestartedAndBusyNonTx random failures
> ------------------------------------------------------------------------------------------------
>
> Key: ISPN-4572
> URL: https://issues.jboss.org/browse/ISPN-4572
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer, Test Suite - Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.0.0.CR1
>
>
> {noformat}
> java.lang.AssertionError:
> at org.testng.AssertJUnit.fail(AssertJUnit.java:59)
> at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24)
> at org.testng.AssertJUnit.assertNull(AssertJUnit.java:282)
> at org.testng.AssertJUnit.assertNull(AssertJUnit.java:274)
> at org.infinispan.statetransfer.StateTransferReplicationQueueTest.doWritingCacheTest(StateTransferReplicationQueueTest.java:144)
> at org.infinispan.statetransfer.StateTransferReplicationQueueTest.testStateTransferWithNodeRestartedAndBusyNonTx(StateTransferReplicationQueueTest.java:88)
> {noformat}
> No trace log available for now.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4575) Map/Reduce incorrect results with a non-shared non-tx intermediate cache
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-4575?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-4575:
--------------------------------
Fix Version/s: 7.0.0.CR1
(was: 7.0.0.Beta2)
> Map/Reduce incorrect results with a non-shared non-tx intermediate cache
> ------------------------------------------------------------------------
>
> Key: ISPN-4575
> URL: https://issues.jboss.org/browse/ISPN-4575
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Distributed Execution and Map/Reduce
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Assignee: Vladimir Blagojevic
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.0.0.CR1
>
>
> In a non-tx cache, if a command is started with topology id {{T}}, and when it is replicated on another node the distribution interceptor sees topology {{T+1}}, it throws an {{OutdatedTopologyException}}. The originator of the command will then retry the command, setting topology {{T+1}}.
> When this happens with a {{PutKeyValueCommand(k, MapReduceManagerImpl.DeltaAwareList)}}, it can lead to duplicate intermediate values.
> Say _A_ is the primary owner of {{k}} in {{T}}, _B_ is a backup owner both in {{T}} and {{T+1}}, and _C_ is the backup owner in {{T}} and the primary owner in {{T+1}} (i.e. _C_ just joined and a rebalance is in progress during {{T}} - see {{NonTxBackupOwnerBecomingPrimaryOwnerTest}}).
> _A_ starts the {{PutKeyValueCommand}} and replicates it to _B_ and _C_. _C_ applies the command, but _B_ already has topology {{T+1}} and throws an {{OutdatedTopologyException}}. _A_ installs topology {{T+1}}, sends the command to _C_ (as the new primary owner), which replicates it to _B_ and then applies it locally a second time.
> This scenario can happen during a M/R task even without nodes joining or leaving. That's because {{CreateCacheCommand}} only calls {{getCache()}} on each member, it doesn't wait for the cache to have a certain number of members or for state transfer to be complete for all the members. The last member to join the intermediate cache is guaranteed to have topology {{T+1}}, but the others may have topology {{T}} by the time the combine phase starts inserting values in the intermediate cache.
> I have seen the {{OutdatedTopologyException}} happen pretty often during the test suite, especially after I removed the duplicate {{invokeRemotely}} call in {{MapReduceTask.executeTaskInit()}}. Most of them were harmless, but there was one failure in CI: http://ci.infinispan.org/viewLog.html?buildId=9811&tab=buildResultsDiv&bu...
> A short-term fix would be to wait for all the members to finish joining in {{CreateCacheCommand}}. Long-term, M/R tasks should be resilient to topology changes, so we should investigate making {{PutKeyValue(k, DeltaAwareList)}} handle {{OutdatedTopologyException}} s.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4568) DistSyncL1RepeatableReadFuncTest.testNoEntryInL1MultipleConcurrentGetsWithInvalidation random failures
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-4568?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-4568:
--------------------------------
Fix Version/s: 7.0.0.CR1
(was: 7.0.0.Beta2)
> DistSyncL1RepeatableReadFuncTest.testNoEntryInL1MultipleConcurrentGetsWithInvalidation random failures
> ------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4568
> URL: https://issues.jboss.org/browse/ISPN-4568
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.0.0.CR1
>
>
> Very likely related to ISPN-4564, as there seem to be 2 unjustified pauses ~ 3s and some log messages also appear to be delayed:
> {noformat}
> 08:23:48,443 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAN-p28720-t1:) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@e9a3538]
> 08:23:48,470 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAN-p28720-t1:) [JGroupsTransport] dests=[DistSyncL1RepeatableReadFuncTest-NodeAN-7764, DistSyncL1RepeatableReadFuncTest-NodeAM-739], command=SingleRpcCommand{cacheName='dist', command=PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}}, mode=SYNCHRONOUS, timeout=60000
> 08:23:50,953 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.impl.NonTxInvocationContext@62801f8c]
> 08:23:50,953 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [L1ManagerImpl] Invalidating keys [key-to-the-cache] on nodes [DistSyncL1RepeatableReadFuncTest-NodeAK-9309]. Use multicast? false
> 08:23:51,060 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [JGroupsTransport] dests=[DistSyncL1RepeatableReadFuncTest-NodeAK-9309], command=SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}}, mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=60000
> 08:23:51,062 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAK-p28661-t5:) [BaseRpcInvokingCommand] Invoking command InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}, with originLocal flag set to false
> 08:23:50,972 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [CallInterceptor] Executing command: PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}.
> 08:23:51,786 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAK-p28661-t5:) [InboundInvocationHandlerImpl] About to send back response null for command SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}}
> 08:23:51,796 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [CommandAwareRpcDispatcher] Responses: [sender=DistSyncL1RepeatableReadFuncTest-NodeAK-9309, received=true, suspected=false]
> 08:23:54,561 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [RpcManagerImpl] Response(s) to SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}} is {}
> 08:23:56,955 ERROR (testng-DistSyncL1RepeatableReadFuncTest:) [UnitTestTestNGListener] Test testNoEntryInL1MultipleConcurrentGetsWithInvalidation(org.infinispan.distribution.DistSyncL1RepeatableReadFuncTest) failed.
> java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask.get(FutureTask.java:201)
> at org.infinispan.commons.util.concurrent.NotifyingFutureImpl.get(NotifyingFutureImpl.java:84)
> at org.infinispan.distribution.BaseDistSyncL1Test.testNoEntryInL1MultipleConcurrentGetsWithInvalidation(BaseDistSyncL1Test.java:217)
> 08:23:54,578 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [L1NonTxInterceptor] Allowing entry to commit as local node is owner
> 08:23:57,861 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [EntryWrappingInterceptor] About to commit entry RepeatableReadEntry(499752d9){key=key-to-the-cache, value=second-put, oldValue=first-put, isCreated=false, isChanged=true, isRemoved=false, isValid=true, skipRemoteGet=false, metadata=EmbeddedMetadata{version=null}}
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4587) Re-add old owners in the pending CH when a node leaves during rebalance
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-4587?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-4587:
--------------------------------
Fix Version/s: 7.0.0.CR1
(was: 7.0.0.Beta2)
> Re-add old owners in the pending CH when a node leaves during rebalance
> -----------------------------------------------------------------------
>
> Key: ISPN-4587
> URL: https://issues.jboss.org/browse/ISPN-4587
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Minor
> Fix For: 7.0.0.CR1
>
>
> Say we have a distributed cache \[A, B\] with {{numSegments = 1}} and {{numOwners = 2}}. The initial topology is _T_: currentCH = \{0: A B\}, pendingCH = null
> C joins, and A starts a rebalance. The topology is now _T + 1_: currentCH = \{0: A B\}, pendingCH = \{0: A C\}
> C now leaves, A updates the consistent hashes to remove it with a new topology _T + 2: currentCH = \{0: A B\}, pendingCH = \{0: A\}
> A doesn't need to receive any data, so the rebalance ends and the pending CH is installed as the current CH in topology _T + 3_: currentCH = \{0: A\}, pendingCH = null
> This algorithm is relatively easy to follow and implement, but it does result in reduced availability of the cache data. It would be better if topology _T + 2_ could re-add B as an owner in the pending CH.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4579) SingleNodeJdbcStoreIT.cleanup NPE after test failure
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-4579?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-4579:
--------------------------------
Fix Version/s: 7.0.0.CR1
(was: 7.0.0.Beta2)
> SingleNodeJdbcStoreIT.cleanup NPE after test failure
> ----------------------------------------------------
>
> Key: ISPN-4579
> URL: https://issues.jboss.org/browse/ISPN-4579
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Server
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Assignee: Galder Zamarreño
> Labels: testsuite_stability
> Fix For: 7.0.0.CR1
>
>
> After a failure in {{SingleNodeJdbcStoreIT.testForcedShutdown}}, it seems not all the test stores are properly initialized, and cleanup fails with a NullPointerException:
> {noformat}
> [00:57:07] : [testForcedShutdown] java.lang.AssertionError: null
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.junit.Assert.assertNotNull(Assert.java:621)
> at org.junit.Assert.assertNotNull(Assert.java:631)
> at org.infinispan.server.test.cs.jdbc.SingleNodeJdbcStoreIT.testRestartStringStoreBefore(SingleNodeJdbcStoreIT.java:223)
> at org.infinispan.server.test.cs.jdbc.SingleNodeJdbcStoreIT.testForcedShutdown(SingleNodeJdbcStoreIT.java:163)
> ...
> [00:57:07]W: [org.infinispan.server:test-suite] java.lang.NullPointerException
> at org.infinispan.server.test.cs.jdbc.SingleNodeJdbcStoreIT.cleanup(SingleNodeJdbcStoreIT.java:82)
> {noformat}
> This bug is only about the NPE, the test failure is agent-specific (probably caused by the low open files/user processes limits).
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months