[JBoss JIRA] (ISPN-9270) ClusteredLockWith2NodesTest.testTryLockAndKillLocking random failures
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-9270?page=com.atlassian.jira.plugin... ]
Pedro Zapata Fernandez updated ISPN-9270:
-----------------------------------------
Sprint: Sprint 9.3.0.Final, Sprint 9.4.0.Beta1, Sprint 9.4.0.CR1, Sprint 9.4.0.CR3, Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32, DataGrid Sprint #33, DataGrid Sprint #34, DataGrid Sprint #36, DataGrid Sprint #37, DataGrid Sprint #38, DataGrid Sprint #39 (was: Sprint 9.3.0.Final, Sprint 9.4.0.Beta1, Sprint 9.4.0.CR1, Sprint 9.4.0.CR3, Sprint 10.0.0.Alpha1, Sprint 10.0.0.Alpha2, Sprint 9.4.0.Final, Sprint 10.0.0.Alpha0, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32, DataGrid Sprint #33, DataGrid Sprint #34, DataGrid Sprint #36, DataGrid Sprint #37, DataGrid Sprint #38)
> ClusteredLockWith2NodesTest.testTryLockAndKillLocking random failures
> ---------------------------------------------------------------------
>
> Key: ISPN-9270
> URL: https://issues.redhat.com/browse/ISPN-9270
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite
> Affects Versions: 9.3.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Labels: testsuite_stability
>
> The test stops the coordinator A of a 2-node cluster \[A, B\], and then verifies that B stays available.
> Unfortunately that doesn't always happen: A tries to install topology \[B\] before shutting down, but B might receive it too late and reject it. If B expects members \[A, B\] and sees only B, the cache will enter degraded mode:
> {noformat}
> 19:04:05,863 INFO (jgroups-8,Test-NodeB-21374:[org.infinispan.LOCKS]) [CLUSTER] [Context=org.infinispan.LOCKS] ISPN100008: Updating cache members list [Test-NodeD-51051], topology id 7
> 19:04:05,863 TRACE (jgroups-4,Test-NodeD-51051:[]) [JGroupsTransport] Test-NodeD-51051 received command from Test-NodeB-21374: CacheTopologyControlCommand{cache=org.infinispan.LOCKS, type=CH_UPDATE, sender=Test-NodeB-21374, joinInfo=null, topologyId=7, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[Test-NodeD-51051: 256]}, pendingCH=null, availabilityMode=AVAILABLE, phase=NO_REBALANCE, actualMembers=[Test-NodeD-51051], throwable=null, viewId=1}
> 19:04:05,880 INFO (jgroups-4,Test-NodeD-51051:[]) [CLUSTER] ISPN000094: Received new cluster view for channel ISPN: [Test-NodeD-51051|2] (1) [Test-NodeD-51051]
> 19:04:05,884 DEBUG (stateTransferExecutor-thread-Test-NodeD-p101-t1:[Merge-2]) [ClusterCacheStatus] Recovered 1 partition(s) for cache org.infinispan.LOCKS: [CacheTopology{id=6, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeB-21374: 124, Test-NodeD-51051: 132]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD-51051], persistentUUIDs=[6ac3a341-5af4-4aa2-ae7b-a4183d02d959]}]
> 19:04:05,884 WARN (stateTransferExecutor-thread-Test-NodeD-p101-t1:[Merge-2]) [CLUSTER] [Context=org.infinispan.LOCKS] ISPN000320: After merge (or coordinator change), cache still hasn't recovered a majority of members and must stay in degraded mode. Current members are [Test-NodeD-51051], lost members are [Test-NodeB-21374], stable members are [Test-NodeB-21374, Test-NodeD-51051]
> 19:04:05,885 INFO (stateTransferExecutor-thread-Test-NodeD-p101-t1:[Merge-2]) [CLUSTER] [Context=org.infinispan.LOCKS] ISPN100011: Entering availability mode DEGRADED_MODE, topology id 8
> 19:04:05,941 DEBUG (transport-thread-Test-NodeD-p100-t2:[Topology-org.infinispan.LOCKS]) [LocalTopologyManagerImpl] Ignoring topology 7 for cache org.infinispan.LOCKS from old coordinator Test-NodeB-21374
> 19:04:06,024 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.lock.ClusteredLockWith2NodesTest.testTryLockAndKillLocking
> java.lang.Error: java.util.concurrent.ExecutionException: java.lang.Error: java.util.concurrent.ExecutionException: org.infinispan.lock.exception.ClusteredLockException: org.infinispan.partitionhandling.AvailabilityException: ISPN000306: Key 'ClusteredLockKey{name=Test}' is not available. Not all owners are in this partition
> at org.infinispan.functional.FunctionalTestUtils.await(FunctionalTestUtils.java:47) ~[infinispan-core-9.3.0-SNAPSHOT-tests.jar:9.3.0-SNAPSHOT]
> at org.infinispan.lock.ClusteredLockWith2NodesTest.testTryLockAndKillLocking(ClusteredLockWith2NodesTest.java:49) ~[test-classes/:?]
> {noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 11 months
[JBoss JIRA] (ISPN-10887) GlobalJmxStatisticsConfiguration.allowDuplicateDomains is not implemented atomically and can fail frequently
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-10887?page=com.atlassian.jira.plugi... ]
Pedro Zapata Fernandez updated ISPN-10887:
------------------------------------------
Sprint: DataGrid Sprint #36, DataGrid Sprint #37, DataGrid Sprint #38, DataGrid Sprint #39 (was: DataGrid Sprint #36, DataGrid Sprint #37, DataGrid Sprint #38)
> GlobalJmxStatisticsConfiguration.allowDuplicateDomains is not implemented atomically and can fail frequently
> ------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-10887
> URL: https://issues.redhat.com/browse/ISPN-10887
> Project: Infinispan
> Issue Type: Bug
> Components: JMX, reporting and management
> Affects Versions: 10.0.0.Final
> Reporter: Nistor Adrian
> Assignee: Nistor Adrian
> Priority: Major
> Fix For: 10.1.0.Beta1
>
>
> The feature is supposed to allow detection of an already occupied jmx domain and generate a new unique one by adding an increasing number suffix. The implementation polls the MBeanServer registry using queryNames until a free domain is found. The unique domain generation and the registration of the first MBean to occupy it is not atomic, so multiple CacheManagers starting up concurrently can mistakenly pick up the same domain.
> This can be fixed by making it atomic, by not using queryNames and instead directly registering the first bean, retrying if InstanceAlreadyExistsException. The first bean has to be picked identically by all cache managers. We'll pick first the cache manager itself and register the other components afterwards.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 11 months