[JBoss JIRA] (ISPN-9501) AbstractCacheStream.performOperationRehashAware() can hang
by Diego Lovison (Jira)
[ https://issues.jboss.org/browse/ISPN-9501?page=com.atlassian.jira.plugin.... ]
Diego Lovison updated ISPN-9501:
--------------------------------
Labels: (was: on-hold)
> AbstractCacheStream.performOperationRehashAware() can hang
> ----------------------------------------------------------
>
> Key: ISPN-9501
> URL: https://issues.jboss.org/browse/ISPN-9501
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.4.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 9.4.0.CR3, 9.3.4.Final
>
>
> There are actually 2 different issues:
> # {{segmentsToProcess}} reuses the {{remoteResults.lostSegments}} {{ConcurrentSmallIntSet}} instance, so when {{remoteResults.lostSegments}} is cleared, {{segmentsToProcess}} is also cleared.
> # Because of ISPN-9500, {{segmentsToProcess.isEmpty()}} keeps returning {{false}}, so {{performOperationRehashAware()}} keeps waiting for a newer topology.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 5 months
[JBoss JIRA] (ISPN-9501) AbstractCacheStream.performOperationRehashAware() can hang
by Diego Lovison (Jira)
[ https://issues.jboss.org/browse/ISPN-9501?page=com.atlassian.jira.plugin.... ]
Diego Lovison updated ISPN-9501:
--------------------------------
Labels: on-hold (was: )
> AbstractCacheStream.performOperationRehashAware() can hang
> ----------------------------------------------------------
>
> Key: ISPN-9501
> URL: https://issues.jboss.org/browse/ISPN-9501
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.4.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Labels: on-hold
> Fix For: 9.4.0.CR3, 9.3.4.Final
>
>
> There are actually 2 different issues:
> # {{segmentsToProcess}} reuses the {{remoteResults.lostSegments}} {{ConcurrentSmallIntSet}} instance, so when {{remoteResults.lostSegments}} is cleared, {{segmentsToProcess}} is also cleared.
> # Because of ISPN-9500, {{segmentsToProcess.isEmpty()}} keeps returning {{false}}, so {{performOperationRehashAware()}} keeps waiting for a newer topology.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 5 months
[JBoss JIRA] (ISPN-9496) Some xsite tests hang during teardown
by Diego Lovison (Jira)
[ https://issues.jboss.org/browse/ISPN-9496?page=com.atlassian.jira.plugin.... ]
Diego Lovison updated ISPN-9496:
--------------------------------
Labels: on-hold testsuite_stability (was: testsuite_stability)
> Some xsite tests hang during teardown
> -------------------------------------
>
> Key: ISPN-9496
> URL: https://issues.jboss.org/browse/ISPN-9496
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.4.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Labels: on-hold, testsuite_stability
> Fix For: 9.4.0.CR3
>
>
> {noformat}
> Test org.infinispan.xsite.statetransfer.failures.RetryMechanismTest.clearContent has been running for more than 300 seconds. Interrupting the test thread and dumping thread stacks of the test suite process and its children.
> Test org.infinispan.xsite.CacheOperationsTest.destroy has been running for more than 300 seconds. Interrupting the test thread and dumping thread stacks of the test suite process and its children.
> ...
> Killed processes 16913
> The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
> Error occurred in starting fork, check output in log
> Process Exit Code: 143
> Crashed tests:
> org.infinispan.eviction.impl.ExceptionEvictionTest
> org.infinispan.statetransfer.ClusterTopologyManagerTest
> org.infinispan.stream.LocalStreamOffHeapTest
> {noformat}
> The timeouts are very likely caused by the JGRP-2277 changes. Most of our tests run without any FD* protocol to avoid creating an extra socket + thread, so when the coordinator leaves, the 2nd node *must* receive the leave message from the coordinator or it will never install a view with itself as the coordinator.
> This dependency still existed before JGRP-2277, but it appears the view message sent by the coordinator before leaving was somehow more likely to reach the 2nd node than the new leave message.
> The "crashed tests" list only includes tests that we know take a very long time to run, so I am assuming that they're not relevant. And unfortunately the mechanism to interrupt long tests still isn't working as it should, the thread dumps are not included in the artifacts.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 5 months