[infinispan-issues] [JBoss JIRA] (ISPN-4776) The topology id for the merged cache topology is not always bigger than all the partition topology ids

Fri Sep 26 10:09:02 EDT 2014

Dan Berindei created ISPN-4776:
----------------------------------

             Summary: The topology id for the merged cache topology is not always bigger than all the partition topology ids
                 Key: ISPN-4776
                 URL: https://issues.jboss.org/browse/ISPN-4776
             Project: Infinispan
          Issue Type: Bug
          Components: Core
    Affects Versions: 7.0.0.Beta2
            Reporter: Dan Berindei
            Assignee: Dan Berindei
            Priority: Blocker
             Fix For: 7.0.0.CR1

With the ISPN-4574 fix, I changed the merge algorithm to pick the partition with the most members (both in the _stable_ topology and in the _current_ topology) instead of the partition with the highest topology id.

However, the biggest topology is not necessarily the partition with the highest topology id, so it's possible that some nodes will ignore the merged topology because they already have a higher topology installed. This happened once in ClusterTopologyManagerTest.testClusterRecoveryAfterThreeWaySplit:

{noformat}
00:24:59,286 DEBUG (transport-thread-NodeL-p33097-t6:) [ClusterCacheStatus] Recovered 3 partition(s) for cache cache: [CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeL-25322: 60+0]}, pendingCH=null, unionCH=null}, CacheTopology{id=6, rebalanceId=3, currentCH=DefaultConsistentHash{ns = 60, owners = (2)[, NodeL-25322: 30+10, NodeN-6727: 30+10]}, pendingCH=DefaultConsistentHash{ns = 60, owners = (2)[, NodeL-25322: 30+30, NodeN-6727: 30+30]}, unionCH=null}, CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}]
00:24:59,287 DEBUG (transport-thread-NodeL-p33097-t6:) [ClusterCacheStatus] Updating topologies after merge for cache cache, current topology = CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}, stable topology = CacheTopology{id=4, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (3)[, NodeL-25322: 20+20, NodeM-12972: 20+20, NodeN-6727: 20+20]}, pendingCH=null, unionCH=null}, availability mode = null
00:24:59,287 DEBUG (transport-thread-NodeL-p33097-t6:) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache cache, topology = CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}, availability mode = null
00:24:59,288 TRACE (transport-thread-NodeL-p33097-t3:) [LocalTopologyManagerImpl] Ignoring consistent hash update for cache cache, current topology is 8: CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}
{noformat}

Failure logs here: http://ci.infinispan.org/viewLog.html?buildId=12364&buildTypeId=Infinispan_MasterHotspotJdk7trac&tab=buildResultsDiv

--
This message was sent by Atlassian JIRA
(v6.3.1#6329)