[JBoss JIRA] (ISPN-11634) Merge TopologyUpdateStableCommand and TopologyUpdateCommand
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-11634?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-11634:
-----------------------------------
Sprint: DataGrid Sprint #43, DataGrid Sprint #44, DataGrid Sprint #45 (was: DataGrid Sprint #43, DataGrid Sprint #44)
> Merge TopologyUpdateStableCommand and TopologyUpdateCommand
> -----------------------------------------------------------
>
> Key: ISPN-11634
> URL: https://issues.redhat.com/browse/ISPN-11634
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 11.0.0.Dev04
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Priority: Major
> Fix For: 11.0.0.Final
>
>
> {code:java}
> 19:48:14,672 TRACE (testng-Test:[]) [TopologyManagementHelper] Attempting to execute command on self: StableTopologyUpdateCommand{cacheName='testCache', origin=Test-NodeD, currentCH=DefaultConsistentHash
> {ns=256, owners = (1)[Test-NodeE: 256+0]}
> , pendingCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], rebalanceId=5, topologyId=27, viewId=7}
> 19:48:14,686 INFO (jgroups-5,Test-NodeE:[]) [CLUSTER] ISPN000094: Received new cluster view for channel org.infinispan.statetransfer.Test: [Test-NodeE|8] (1) [Test-NodeE]
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max stable partition topology: CacheTopology{id=25, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash
> {ns=256, owners = (2)[Test-NodeD: 129+127, Test-NodeE: 127+129]}
> , pendingCH=null, unionCH=null, actualMembers=[Test-NodeD, Test-NodeE], persistentUUIDs=[099401f9-c0b6-460d-8dc8-5051699e8287, 72b17309-62d1-4928-abf6-88a8606ef342]}
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max active partition topology: CacheTopology{id=27, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash
> {ns=256, owners = (1)[Test-NodeE: 256+0]}
> , pendingCH=null, unionCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}
> 19:48:14,690 DEBUG (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Ignoring topology 27 for cache testCache from old coordinator Test-NodeD
> {code}
> The old coordinator (NodeD) sent the topology update and the stable topology update, and NodeE received both. However NodeE might ignores the stable topology update because it was processed after it got the new cluster view, with itself as the coordinator. The proper fix is to get rid of the StableTopologyUpdateCommand and add a stable=true/false field in CacheTopology or in TopologyUpdateCommand.
> There's still a small chance that the coordinator will shut down before sending the topology update command to any of the other nodes, but it's a small chance. It's much more likely that a separate stable topology update command will be ignored because it is queued by LocalTopologyManagerImpl after the regular topology update command, and the regular topology update has a lot of processing to do in StateConsumerImpl.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 10 months
[JBoss JIRA] (ISPN-5938) ClusterListenerReplInitialStateTest.testPrimaryOwnerGoesDownAfterBackupRaisesEvent fails randomly
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-5938?page=com.atlassian.jira.plugin... ]
Tristan Tarrant updated ISPN-5938:
----------------------------------
Fix Version/s: 11.0.0.Final
(was: 11.0.0.CR1)
> ClusterListenerReplInitialStateTest.testPrimaryOwnerGoesDownAfterBackupRaisesEvent fails randomly
> -------------------------------------------------------------------------------------------------
>
> Key: ISPN-5938
> URL: https://issues.redhat.com/browse/ISPN-5938
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite
> Affects Versions: 11.0.0.Alpha1
> Reporter: Roman Macor
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 11.0.0.Final
>
>
> ClusterListenerReplInitialStateTest.testPrimaryOwnerGoesDownAfterBackupRaisesEvent fails randomly with:
> Stacktrace
> java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask.get(FutureTask.java:205)
> at org.infinispan.notifications.cachelistener.cluster.ClusterListenerReplTest.testPrimaryOwnerGoesDownAfterBackupRaisesEvent(ClusterListenerReplTest.java:123)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:714)
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
> at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
> at org.testng.TestRunner.privateRun(TestRunner.java:767)
> at org.testng.TestRunner.run(TestRunner.java:617)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
> at org.testng.SuiteRunner.access$000(SuiteRunner.java:37)
> at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:368)
> at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 10 months
[JBoss JIRA] (ISPN-11605) Remove TopologyUpdateStableCommand
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-11605?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-11605:
-----------------------------------
Fix Version/s: 11.0.0.Final
(was: 11.0.0.CR1)
> Remove TopologyUpdateStableCommand
> ----------------------------------
>
> Key: ISPN-11605
> URL: https://issues.redhat.com/browse/ISPN-11605
> Project: Infinispan
> Issue Type: Task
> Components: Core
> Affects Versions: 11.0.0.Dev03
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Labels: testsuite_stability
> Fix For: 11.0.0.Final
>
>
> Having a separate command for stable topologies means nodes may receive a topology update that should be stable without also receiving the confirmation that the topology is stable, complicating cluster recovery.
> As an example, when the coordinator node shuts down, it first installs a new cache topology without itself for all of the running caches.
> If the number of remaining owners is {{< numOwners}}, no rebalance is needed, and the old coordinator immediately sends a stable topology update as well.
> But any (stable) topology updates from the old coordinator not yet processed by the time of the {{CacheStatusRequestCommand}} from the new coordinator will be ignored.
> If the cluster had only 2 nodes, as in {{StatefulSetRollingUpgradeTest}} and {{StatefulSetRollingUpgradeIT}}, the stable topology update is vital to keep the cache in {{AVAILABLE}} mode, otherwise it goes into {{DEGRADED}} mode and no new nodes can join.
> {noformat}
> 19:48:14,671 TRACE (jgroups-4,Test-NodeE:[]) [JGroupsTransport] Test-NodeE received command from Test-NodeD: ConsistentHashUpdateCommand{cacheName='testCache', origin=null, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, phase=NO_REBALANCE, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], availabilityMode=AVAILABLE, rebalanceId=5, topologyId=27, viewId=7}
> 19:48:14,672 TRACE (jgroups-5,Test-NodeE:[]) [JGroupsTransport] Test-NodeE received command from Test-NodeD: StableTopologyUpdateCommand{cacheName='testCache', origin=null, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], rebalanceId=5, topologyId=27, viewId=7}
> 19:48:14,675 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[]) [LocalTopologyManagerImpl] Acquired cache status testCache
> 19:48:14,675 DEBUG (non-blocking-thread-Test-NodeE-p9541-t5:[]) [LocalTopologyManagerImpl] Updating local topology for cache testCache: CacheTopology{id=27, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}
> 19:48:14,686 INFO (jgroups-5,Test-NodeE:[]) [CLUSTER] ISPN000094: Received new cluster view for channel org.infinispan.statetransfer.Test: [Test-NodeE|8] (1) [Test-NodeE]
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[]) [LocalTopologyManagerImpl] Released cache status testCache
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max stable partition topology: CacheTopology{id=25, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeD: 129+127, Test-NodeE: 127+129]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD, Test-NodeE], persistentUUIDs=[099401f9-c0b6-460d-8dc8-5051699e8287, 72b17309-62d1-4928-abf6-88a8606ef342]}
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max active partition topology: CacheTopology{id=27, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Acquired cache status testCache
> 19:48:14,690 DEBUG (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Ignoring topology 27 for cache testCache from old coordinator Test-NodeD
> 19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Released cache status testCache
> 19:48:14,704 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [ClusterCacheStatus] Cache testCache availability changed: AVAILABLE -> DEGRADED_MODE
> 19:48:25,194 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.statetransfer.StatefulSetRollingUpgradeTest.testStateTransferRestart[nodes=2]
> org.infinispan.commons.CacheException: Initial state transfer timed out for cache testCache on Test-NodeF
> at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:240) ~[classes/:?]
> at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1049) ~[classes/:?]
> at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:513) ~[classes/:?]
> at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:692) ~[classes/:?]
> at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:631) ~[classes/:?]
> at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:516) ~[classes/:?]
> at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:497) ~[classes/:?]
> at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:490) ~[classes/:?]
> at org.infinispan.test.MultipleCacheManagersTest.getCaches(MultipleCacheManagersTest.java:322) ~[test-classes/:?]
> at org.infinispan.test.MultipleCacheManagersTest.waitForClusterToForm(MultipleCacheManagersTest.java:329) ~[test-classes/:?]
> at org.infinispan.statetransfer.StatefulSetRollingUpgradeTest.testStateTransferRestart(StatefulSetRollingUpgradeTest.java:78) ~[test-classes/:?]
> {noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 10 months
[JBoss JIRA] (ISPN-11567) Scattered caches with a single node expire entries immediately
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-11567?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-11567:
-----------------------------------
Fix Version/s: 11.0.0.Final
(was: 11.0.0.CR1)
> Scattered caches with a single node expire entries immediately
> --------------------------------------------------------------
>
> Key: ISPN-11567
> URL: https://issues.redhat.com/browse/ISPN-11567
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 10.1.5.Final, 11.0.0.Dev03
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 11.0.0.Final
>
>
> {{ClusterExpirationManager.checkExpiredMaxIdle()}} sends a {{TouchCommand}} to the other owners and expires the entry locally if the touch was unsuccessful.
> {{ScatteredTouchResponseCollector}} is stateless, and reports that the entry has not been touched if it doesn't receive any {{true}} response. However, this is only correct if at least one backup entry existing on another node, and that is not the always the case: e.g. between the backup node leaving the cluster and another node becoming a backup, or when the cluster has a single node.
> Since {{ClusterExpirationManager.checkExpiredMaxIdle()}} is called on every read, before the entry being expired on the local node, it means a transient entry in a scattered cache with a single node will expire on the first read, immediately after being inserted.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 10 months