[JBoss JIRA] (ISPN-11510) Convert detection of blocking or non blocking threads
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-11510?page=com.atlassian.jira.plugi... ]
Dan Berindei updated ISPN-11510:
--------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
> Convert detection of blocking or non blocking threads
> -----------------------------------------------------
>
> Key: ISPN-11510
> URL: https://issues.redhat.com/browse/ISPN-11510
> Project: Infinispan
> Issue Type: Sub-task
> Reporter: Will Burns
> Assignee: Will Burns
> Priority: Major
> Fix For: 11.0.0.Dev04
>
>
> Unfortunately detecting a blocking thread by name is very fragile as a user can change the name of the threads.
> Also a non blocking thread is detected by checking the implementation of the thread. This will not work for Java 15 with loom as we may not have a thread but rather a fiber.
> We should detect these instead by using a thread group which will work for both and be more explicit.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years
[JBoss JIRA] (ISPN-11609) StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringPut[SCATTERED_SYNC] random failures
by Dan Berindei (Jira)
Dan Berindei created ISPN-11609:
-----------------------------------
Summary: StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringPut[SCATTERED_SYNC] random failures
Key: ISPN-11609
URL: https://issues.redhat.com/browse/ISPN-11609
Project: Infinispan
Issue Type: Bug
Components: Core, Test Suite
Affects Versions: 11.0.0.Dev03
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 11.0.0.Final
{noformat}
java.util.concurrent.TimeoutException: Timed out waiting for event pre_rebalance_confirmation_2_from_StateTransferOverwritingValueTest-NodeB
at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:50)
at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:40)
at org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.doTest(StateTransferOverwritingValueTest.java:201)
at org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringPut(StateTransferOverwritingValueTest.java:96)
{noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years
[JBoss JIRA] (ISPN-11608) SoftIndexFileStoreTest.testOverrideWithExpirableAndCompaction random failures
by Dan Berindei (Jira)
Dan Berindei created ISPN-11608:
-----------------------------------
Summary: SoftIndexFileStoreTest.testOverrideWithExpirableAndCompaction random failures
Key: ISPN-11608
URL: https://issues.redhat.com/browse/ISPN-11608
Project: Infinispan
Issue Type: Bug
Components: Loaders and Stores, Test Suite
Affects Versions: 11.0.0.Dev03, 10.1.5.Final
Reporter: Dan Berindei
Fix For: 11.0.0.Final
SIFS is not restarting properly, maybe because it the expiration worker thread was writing to the store while it was stopping?
{noformat}
org.infinispan.persistence.spi.PersistenceException: ISPN029019: Cannot load key key from index.
at org.infinispan.persistence.sifs.SoftIndexFileStore.loadEntry(SoftIndexFileStore.java:474)
at org.infinispan.persistence.sifs.SoftIndexFileStoreTest.testOverrideWithExpirableAndCompaction(SoftIndexFileStoreTest.java:177)
Caused by: org.infinispan.persistence.spi.PersistenceException: ISPN029020: Index looks corrupt.
at org.infinispan.persistence.sifs.IndexNode.applyOnLeaf(IndexNode.java:288)
at org.infinispan.persistence.sifs.Index.getRecord(Index.java:86)
at org.infinispan.persistence.sifs.SoftIndexFileStore.loadEntry(SoftIndexFileStore.java:467)
... 24 more
Caused by: org.infinispan.persistence.sifs.IndexNode$IndexNodeOutdatedException: 8:709
at org.infinispan.persistence.sifs.IndexNode$LeafNode.loadRecord(IndexNode.java:978)
at org.infinispan.persistence.sifs.IndexNode$ReadOperation$1.apply(IndexNode.java:219)
at org.infinispan.persistence.sifs.IndexNode$ReadOperation$1.apply(IndexNode.java:216)
at org.infinispan.persistence.sifs.IndexNode.applyOnLeaf(IndexNode.java:284)
... 26 more
{noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years
[JBoss JIRA] (ISPN-11605) Remove TopologyUpdateStableCommand
by Dan Berindei (Jira)
Dan Berindei created ISPN-11605:
-----------------------------------
Summary: Remove TopologyUpdateStableCommand
Key: ISPN-11605
URL: https://issues.redhat.com/browse/ISPN-11605
Project: Infinispan
Issue Type: Task
Components: Core
Affects Versions: 11.0.0.Dev03
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 11.0.0.Dev04
Having a separate command for stable topologies means nodes may receive a topology update that should be stable without also receiving the confirmation that the topology is stable, complicating cluster recovery.
As an example, when the coordinator node shuts down, it first installs a new cache topology without itself for all of the running caches.
If the number of remaining owners is {{< numOwners}}, no rebalance is needed, and the old coordinator immediately sends a stable topology update as well.
But any (stable) topology updates from the old coordinator not yet processed by the time of the {{CacheStatusRequestCommand}} from the new coordinator will be ignored.
If the cluster had only 2 nodes, as in {{StatefulSetRollingUpgradeTest}} and {{StatefulSetRollingUpgradeIT}}, the stable topology update is vital to keep the cache in {{AVAILABLE}} mode, otherwise it goes into {{DEGRADED}} mode and no new nodes can join.
{noformat}
19:48:14,671 TRACE (jgroups-4,Test-NodeE:[]) [JGroupsTransport] Test-NodeE received command from Test-NodeD: ConsistentHashUpdateCommand{cacheName='testCache', origin=null, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, phase=NO_REBALANCE, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], availabilityMode=AVAILABLE, rebalanceId=5, topologyId=27, viewId=7}
19:48:14,672 TRACE (jgroups-5,Test-NodeE:[]) [JGroupsTransport] Test-NodeE received command from Test-NodeD: StableTopologyUpdateCommand{cacheName='testCache', origin=null, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], rebalanceId=5, topologyId=27, viewId=7}
19:48:14,675 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[]) [LocalTopologyManagerImpl] Acquired cache status testCache
19:48:14,675 DEBUG (non-blocking-thread-Test-NodeE-p9541-t5:[]) [LocalTopologyManagerImpl] Updating local topology for cache testCache: CacheTopology{id=27, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}
19:48:14,686 INFO (jgroups-5,Test-NodeE:[]) [CLUSTER] ISPN000094: Received new cluster view for channel org.infinispan.statetransfer.Test: [Test-NodeE|8] (1) [Test-NodeE]
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[]) [LocalTopologyManagerImpl] Released cache status testCache
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max stable partition topology: CacheTopology{id=25, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeD: 129+127, Test-NodeE: 127+129]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD, Test-NodeE], persistentUUIDs=[099401f9-c0b6-460d-8dc8-5051699e8287, 72b17309-62d1-4928-abf6-88a8606ef342]}
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max active partition topology: CacheTopology{id=27, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Acquired cache status testCache
19:48:14,690 DEBUG (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Ignoring topology 27 for cache testCache from old coordinator Test-NodeD
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Released cache status testCache
19:48:14,704 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [ClusterCacheStatus] Cache testCache availability changed: AVAILABLE -> DEGRADED_MODE
19:48:25,194 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.statetransfer.StatefulSetRollingUpgradeTest.testStateTransferRestart[nodes=2]
org.infinispan.commons.CacheException: Initial state transfer timed out for cache testCache on Test-NodeF
at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:240) ~[classes/:?]
at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1049) ~[classes/:?]
at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:513) ~[classes/:?]
at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:692) ~[classes/:?]
at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:631) ~[classes/:?]
at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:516) ~[classes/:?]
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:497) ~[classes/:?]
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:490) ~[classes/:?]
at org.infinispan.test.MultipleCacheManagersTest.getCaches(MultipleCacheManagersTest.java:322) ~[test-classes/:?]
at org.infinispan.test.MultipleCacheManagersTest.waitForClusterToForm(MultipleCacheManagersTest.java:329) ~[test-classes/:?]
at org.infinispan.statetransfer.StatefulSetRollingUpgradeTest.testStateTransferRestart(StatefulSetRollingUpgradeTest.java:78) ~[test-classes/:?]
{noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years