[JBoss JIRA] (ISPN-5883) Node can apply new topology after sending status response

Saturday, 30 April 2016

     [
https://issues.jboss.org/browse/ISPN-5883?page=com.atlassian.jira.plugin....
]

Tristan Tarrant updated ISPN-5883:
----------------------------------
    Fix Version/s: 8.1.4.Final

...
 Node can apply new topology after sending status response
 ---------------------------------------------------------

                 Key: ISPN-5883
                 URL: https://issues.jboss.org/browse/ISPN-5883
             Project: Infinispan
          Issue Type: Bug
          Components: Core, Test Suite - Core
    Affects Versions: 8.0.1.Final, 7.2.5.Final, 8.1.0.Alpha2
            Reporter: Dan Berindei
            Assignee: Dan Berindei
            Priority: Critical
              Labels: testsuite_stability
             Fix For: 8.2.0.Beta1, 8.1.4.Final

 {{LocalTopologyManagerImpl}} is responsible for sending the
{{ClusterTopologyControlCommand(GET_STATUS)}} response, and when it sends the response it
doesn't check the current view id against the new coordinator's view id. If the
old coordinator already sent a topology update before the merge, that topology update
might be processed after sending the status response. The new coordinator will send a
topology update with a topology id of {{max(status response topology ids) + 1}}. The node
will then process the topology update from the old coordinator, but it will ignore the
topology update from the new coordinator with the same topology id.
 This is extra common in the partition handling tests, e.g.
{{BasePessimisticTxPartitionAndMergeTest}} subclasses, because the test
"injects" the JGroups view on each node serially, and often the 4th node sends
the status response before it gets the new view.
 {noformat}
 22:16:37,776 DEBUG (remote-thread-NodeD-p26-t6:[]) [LocalTopologyManagerImpl] Sending
cluster status response for view 10
 // Topology from NodeC
 22:16:37,778 DEBUG (transport-thread-NodeD-p28-t2:[]) [LocalTopologyManagerImpl] Updating
local topology for cache pes-cache: CacheTopology{id=8, rebalanceId=3,
currentCH=DefaultConsistentHash{ns=60, owners = (4)[NodeA-37631: 15+15, NodeB-47846:
15+15, NodeC-46467: 15+15, NodeD-30486: 15+15]}, pendingCH=null, unionCH=null,
actualMembers=[NodeC-46467, NodeD-30486]}
 // Later, topology from NodeA
 22:16:37,827 DEBUG (transport-thread-NodeD-p28-t1:[]) [LocalTopologyManagerImpl] Ignoring
late consistent hash update for cache pes-cache, current topology is 8:
CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns=60, owners =
(4)[NodeA-37631: 15+15, NodeB-47846: 15+15, NodeC-46467: 15+15, NodeD-30486: 15+15]},
pendingCH=null, unionCH=null, actualMembers=[NodeA-37631, NodeB-47846, NodeC-46467,
NodeD-30486]}
 {noformat}
 As a solution, we can delay sending the status response until we have the same view as
the coordinator (or a later one). We already check that the sender is the current
coordinator before applying a topology update, so this will guarantee that the we
don't apply other topology updates from the old coordinator. Since the status request
is only sent after the new view was installed, this will not introduce any delays in the
vast majority of cases.  

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009