[infinispan-issues] [JBoss JIRA] (ISPN-11566) StatefulSetRollingUpgradeTest failing on CI
Ryan Emerson (Jira)
issues at jboss.org
Tue Apr 14 08:46:55 EDT 2020
[ https://issues.redhat.com/browse/ISPN-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034464#comment-14034464 ]
Ryan Emerson commented on ISPN-11566:
-------------------------------------
Dan: @Ryan Emerson this is what I got from the log:
19:48:14,672 TRACE (testng-Test:[]) [TopologyManagementHelper] Attempting to execute command on self: StableTopologyUpdateCommand{cacheName='testCache', origin=Test-NodeD, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], rebalanceId=5, topologyId=27, viewId=7}
19:48:14,686 INFO (jgroups-5,Test-NodeE:[]) [CLUSTER] ISPN000094: Received new cluster view for channel org.infinispan.statetransfer.Test: [Test-NodeE|8] (1) [Test-NodeE]
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max stable partition topology: CacheTopology{id=25, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeD: 129+127, Test-NodeE: 127+129]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD, Test-NodeE], persistentUUIDs=[099401f9-c0b6-460d-8dc8-5051699e8287, 72b17309-62d1-4928-abf6-88a8606ef342]}
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max active partition topology: CacheTopology{id=27, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}
Dan: The old coordinator (NodeD) sent the topology update and the stable topology update, and NodeE received both
Dan: But NodeE might have ignored the stable topology update because it was processed after it got the new cluster view, with itself as the coordinator
Dan: Although I need to look further, because I don't see any log about ignoring the stable topology update
Dan: The proper fix is probably to get rid of the StableTopologyUpdateCommand
Dan: And either compute deterministically whether a cache topology is stable or not on each node, based on the phase and actual members
Dan: Or add a stable=true/false field in CacheTopology or in TopologyUpdateCommand
Dan: I've been thinking about doing this for a while, but I don't remember if I ever got so far as to create a JIRA
Dan: I found the ignore log:
19:48:14,690 DEBUG (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Ignoring topology 27 for cache testCache from old coordinator Test-NodeD
Dan: There's still a small chance that the coordinator will shut down before sending the topology update command to any of the other nodes, but it's a small chance
Dan: It's much more likely that a separate stable topology update command will be ignored because it is queued by LocalTopologyManagerImpl after the regular topology update command, and the regular topology update has a lot of processing to do in StateConsumerImpl
Dan: Very similar to ISPN-9270
Ryan Emerson: Thanks for looking into this @Dan
Ryan Emerson: I think removing the stable update command utilising a boolean in TopologyUpdateCommand is probably the simplest solution
Ryan Emerson: deterministically computing the stable topology seems like their might be a few gotchas (gut feeling, I haven't properly thought about it)
Ryan Emerson: We could always update TopologyUpdateCommand and then investigate the latter if there is time in 11.x
Dan: +1
> StatefulSetRollingUpgradeTest failing on CI
> -------------------------------------------
>
> Key: ISPN-11566
> URL: https://issues.redhat.com/browse/ISPN-11566
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite
> Affects Versions: 11.0.0.Dev03
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
More information about the infinispan-issues
mailing list