[JBoss JIRA] (ISPN-2778) When a cache is restarted, the LEAVE and JOIN commands are not ordered

Thursday, 31 January 2013

Dan Berindei created ISPN-2778:
----------------------------------

             Summary: When a cache is restarted, the LEAVE and JOIN commands are not
ordered
                 Key: ISPN-2778
                 URL: https://issues.jboss.org/browse/ISPN-2778
             Project: Infinispan
          Issue Type: Bug
          Components: State transfer
    Affects Versions: 5.2.0.CR3
            Reporter: Dan Berindei
            Assignee: Dan Berindei
             Fix For: 5.2.0.Final

The LEAVE command is sent asynchronously, so if the cache is restarted it is possible for
the new JOIN command to be processed before the LEAVE command on the coordinator.

This doesn't work out very well: as the joining node is already present in the
consistent hash during join, it won't do any state transfer. After that, it will
receive a topology update with itself removed from the consistent hash.

I have seen one failure because of this in
{{StateTransferFunctionalTest.testInitialStateTransferAfterRestart}}:

{noformat}
03:25:36,749 TRACE (testng-StateTransferFunctionalTest:) [JGroupsTransport]
dests=[NodeG-42396], command=CacheTopologyControlCommand{cache=nbst, type=LEAVE,
sender=NodeH-44562, joinInfo=null, topologyId=0, currentCH=null, pendingCH=null,
throwable=null, viewId=1}, mode=ASYNCHRONOUS_WITH_SYNC_MARSHALLING, timeout=0
03:25:36,770 TRACE (testng-StateTransferFunctionalTest:) [JGroupsTransport]
dests=[NodeG-42396], command=CacheTopologyControlCommand{cache=nbst, type=JOIN,
sender=NodeH-44562,
joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.ReplicatedConsistentHashFactory@335703e5,
hashFunction=org.infinispan.commons.hash.MurmurHash3@64b6f0a5, numSegments=60,
numOwners=2, timeout=240000}, topologyId=0, currentCH=null, pendingCH=null,
throwable=null, viewId=1}, mode=SYNCHRONOUS, timeout=240000
03:25:36,771 TRACE (OOB-1,ISPN,NodeG-42396:) [CommandAwareRpcDispatcher] Attempting to
execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=nbst, type=JOIN,
sender=NodeH-44562,
joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.ReplicatedConsistentHashFactory@3aea6b42,
hashFunction=org.infinispan.commons.hash.MurmurHash3@7427d845, numSegments=60,
numOwners=2, timeout=240000}, topologyId=0, currentCH=null, pendingCH=null,
throwable=null, viewId=1} [sender=NodeH-44562]
03:25:36,771 TRACE (testng-StateTransferFunctionalTest:) [StateTransferManagerImpl]
Installing new cache topology CacheTopology{id=2,
currentCH=ReplicatedConsistentHash{members=[NodeG-42396, NodeH-44562]}, pendingCH=null} on
cache nbst
03:25:36,782 TRACE (OOB-2,ISPN,NodeG-42396:) [CommandAwareRpcDispatcher] Attempting to
execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=nbst, type=LEAVE,
sender=NodeH-44562, joinInfo=null, topologyId=0, currentCH=null, pendingCH=null,
throwable=null, viewId=1} [sender=NodeH-44562]
03:25:36,840 TRACE (OOB-2,ISPN,NodeG-42396:nbst nbst) [StateTransferManagerImpl]
Installing new cache topology CacheTopology{id=3,
currentCH=ReplicatedConsistentHash{members=[NodeG-42396]}, pendingCH=null} on cache nbst
03:25:36,852 TRACE (OOB-2,ISPN,NodeH-44562:nbst) [StateTransferManagerImpl] Installing new
cache topology CacheTopology{id=3,
currentCH=ReplicatedConsistentHash{members=[NodeG-42396]}, pendingCH=null} on cache nbst
{noformat}

The solution is be to make the LEAVE command synchronous.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009