[JBoss JIRA] (ISPN-6236) StateTrasnferFunctionalTest random failures
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6236?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-6236:
-------------------------------
Status: Open (was: New)
> StateTrasnferFunctionalTest random failures
> -------------------------------------------
>
> Key: ISPN-6236
> URL: https://issues.jboss.org/browse/ISPN-6236
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 8.2.0.Beta2
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
> Fix For: 8.2.0.CR1
>
>
> Since the ISPN-6214 fix, the global components are started when the {{DefaultCacheManager}} is started. This includes {{LocalTopologyManager}} and {{ClusterTopologyManager}}, created in the {{GlobalComponentRegistry}} constructor, and they also pull in the {{Transport}} component.
> In {{StateTransferFunctionalTest}} and its subclasses, this means the JGroups channel is created and joins the cluster before the {{JoiningNode}} registers its view listener. Because the listener is registered after the view update, it doesn't receive any notifications, and {{waitForJoin()}} always times out.
> We should remove the {{JoiningNode}} class altogether, because merges during initial cluster formation are practically impossible with our current test setup.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 1 month
[JBoss JIRA] (ISPN-6236) StateTrasnferFunctionalTest random failures
by Dan Berindei (JIRA)
Dan Berindei created ISPN-6236:
----------------------------------
Summary: StateTrasnferFunctionalTest random failures
Key: ISPN-6236
URL: https://issues.jboss.org/browse/ISPN-6236
Project: Infinispan
Issue Type: Bug
Components: Test Suite - Core
Affects Versions: 8.2.0.Beta2
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 8.2.0.CR1
Since the ISPN-6214 fix, the global components are started when the {{DefaultCacheManager}} is started. This includes {{LocalTopologyManager}} and {{ClusterTopologyManager}}, created in the {{GlobalComponentRegistry}} constructor, and they also pull in the {{Transport}} component.
In {{StateTransferFunctionalTest}} and its subclasses, this means the JGroups channel is created and joins the cluster before the {{JoiningNode}} registers its view listener. Because the listener is registered after the view update, it doesn't receive any notifications, and {{waitForJoin()}} always times out.
We should remove the {{JoiningNode}} class altogether, because merges during initial cluster formation are practically impossible with our current test setup.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 1 month
[JBoss JIRA] (ISPN-5495) ConcurrentStartTest.testConcurrentStart random failures
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5495?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-5495:
------------------------------------
Failures in {{ConcurrentStartTest.testConcurrentStart}} have been more frequent recently, because of a change introduced in ISPN-5883. The first command received by NodeB from NodeA is no longer a {{CacheTopologyControlCommand(POLICY_GET_STATUS)}}, but a {{CacheTopologyControlCommand(POLICY_GET_STATUS)}}.
The test was passing most of the time because the main thread was waiting for the command with {{CheckPoint.await()}}, which doesn't throw an exception on timeout, but the then the {{BlockingInboundInvocationHandler}} does use {{CheckPoint.awaitStrict()}} with the same delay, and some of the time it was timing out.
> ConcurrentStartTest.testConcurrentStart random failures
> -------------------------------------------------------
>
> Key: ISPN-5495
> URL: https://issues.jboss.org/browse/ISPN-5495
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 7.2.1.Final
> Reporter: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 8.2.0.CR1
>
>
> {noformat}
> org.testng.internal.thread.ThreadTimeoutException: Method org.testng.internal.TestNGMethod.testConcurrentStart() didn't finish within the time-out 60000
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
> at org.infinispan.test.TestingUtil.waitForRehashToComplete(TestingUtil.java:253)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 1 month
[JBoss JIRA] (ISPN-6235) ClusterTopologyManagerImpl join during cluster status recovery
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6235?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-6235:
-------------------------------
Status: Open (was: New)
> ClusterTopologyManagerImpl join during cluster status recovery
> --------------------------------------------------------------
>
> Key: ISPN-6235
> URL: https://issues.jboss.org/browse/ISPN-6235
> Project: Infinispan
> Issue Type: Bug
> Reporter: Dan Berindei
> Labels: testsuite_stability
>
> If the joiner has the correct view id, but the current status is
> RECOVERING_CLUSTER, we should wait for the cluster status recovery to
> finish before adding the new member.
> We are currently not doing that, so the new member could be erased by the status recovery process that's in progress. This can happen if the coordinator joiner already had been a member of the JGroups cluster for some time, and there's no view change when they actually start their caches (exactly the scenario in {{ConcurrentStartTest}}).
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 1 month
[JBoss JIRA] (ISPN-6235) ClusterTopologyManagerImpl join during cluster status recovery
by Dan Berindei (JIRA)
Dan Berindei created ISPN-6235:
----------------------------------
Summary: ClusterTopologyManagerImpl join during cluster status recovery
Key: ISPN-6235
URL: https://issues.jboss.org/browse/ISPN-6235
Project: Infinispan
Issue Type: Bug
Reporter: Dan Berindei
If the joiner has the correct view id, but the current status is
RECOVERING_CLUSTER, we should wait for the cluster status recovery to
finish before adding the new member.
We are currently not doing that, so the new member could be erased by the status recovery process that's in progress. This can happen if the coordinator joiner already had been a member of the JGroups cluster for some time, and there's no view change when they actually start their caches (exactly the scenario in {{ConcurrentStartTest}}).
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 1 month
[JBoss JIRA] (ISPN-6235) ClusterTopologyManagerImpl join during cluster status recovery
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6235?page=com.atlassian.jira.plugin.... ]
Dan Berindei reassigned ISPN-6235:
----------------------------------
Assignee: Dan Berindei
> ClusterTopologyManagerImpl join during cluster status recovery
> --------------------------------------------------------------
>
> Key: ISPN-6235
> URL: https://issues.jboss.org/browse/ISPN-6235
> Project: Infinispan
> Issue Type: Bug
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
>
> If the joiner has the correct view id, but the current status is
> RECOVERING_CLUSTER, we should wait for the cluster status recovery to
> finish before adding the new member.
> We are currently not doing that, so the new member could be erased by the status recovery process that's in progress. This can happen if the coordinator joiner already had been a member of the JGroups cluster for some time, and there's no view change when they actually start their caches (exactly the scenario in {{ConcurrentStartTest}}).
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 1 month
[JBoss JIRA] (ISPN-6087) Hadoop MapReduce task returns wrong results after one of the servers from the cluster is stopped
by Gustavo Fernandes (JIRA)
[ https://issues.jboss.org/browse/ISPN-6087?page=com.atlassian.jira.plugin.... ]
Gustavo Fernandes updated ISPN-6087:
------------------------------------
Status: Open (was: New)
> Hadoop MapReduce task returns wrong results after one of the servers from the cluster is stopped
> -------------------------------------------------------------------------------------------------
>
> Key: ISPN-6087
> URL: https://issues.jboss.org/browse/ISPN-6087
> Project: Infinispan
> Issue Type: Bug
> Components: Hadoop Integration
> Reporter: Anna Manukyan
> Assignee: Gustavo Fernandes
> Attachments: FailoverTest.log
>
>
> The test is performed on 4 node cluster. The data read is performed from Infinispan cache, the write is performed to HDFS.
> There are 200000 entries in the cache. As soon as the Hadoop Job is submitted for performing map/reduce task, one of the servers from the cluster is stopped.
> The MapReduce task performs calculation of word count.
> As a result, the Hadoop Map/Reduce result map contains wrong results, that is the number of the certain words found is greater than it is expected.
> The failure doesn't always happen. It depends on which stage was the mapreduce task when the server stopped.
> You can find the test execution log attached.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 1 month
[JBoss JIRA] (ISPN-6087) Hadoop MapReduce task returns wrong results after one of the servers from the cluster is stopped
by Gustavo Fernandes (JIRA)
[ https://issues.jboss.org/browse/ISPN-6087?page=com.atlassian.jira.plugin.... ]
Work on ISPN-6087 started by Gustavo Fernandes.
-----------------------------------------------
> Hadoop MapReduce task returns wrong results after one of the servers from the cluster is stopped
> -------------------------------------------------------------------------------------------------
>
> Key: ISPN-6087
> URL: https://issues.jboss.org/browse/ISPN-6087
> Project: Infinispan
> Issue Type: Bug
> Components: Hadoop Integration
> Reporter: Anna Manukyan
> Assignee: Gustavo Fernandes
> Attachments: FailoverTest.log
>
>
> The test is performed on 4 node cluster. The data read is performed from Infinispan cache, the write is performed to HDFS.
> There are 200000 entries in the cache. As soon as the Hadoop Job is submitted for performing map/reduce task, one of the servers from the cluster is stopped.
> The MapReduce task performs calculation of word count.
> As a result, the Hadoop Map/Reduce result map contains wrong results, that is the number of the certain words found is greater than it is expected.
> The failure doesn't always happen. It depends on which stage was the mapreduce task when the server stopped.
> You can find the test execution log attached.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 1 month