[infinispan-issues] [JBoss JIRA] (ISPN-2778) When a cache is restarted, the LEAVE and JOIN commands are not ordered

Dan Berindei (JIRA) jira-events at lists.jboss.org
Thu Jan 31 03:28:51 EST 2013


Dan Berindei created ISPN-2778:
----------------------------------

             Summary: When a cache is restarted, the LEAVE and JOIN commands are not ordered
                 Key: ISPN-2778
                 URL: https://issues.jboss.org/browse/ISPN-2778
             Project: Infinispan
          Issue Type: Bug
          Components: State transfer
    Affects Versions: 5.2.0.CR3
            Reporter: Dan Berindei
            Assignee: Dan Berindei
             Fix For: 5.2.0.Final


The LEAVE command is sent asynchronously, so if the cache is restarted it is possible for the new JOIN command to be processed before the LEAVE command on the coordinator.

This doesn't work out very well: as the joining node is already present in the consistent hash during join, it won't do any state transfer. After that, it will receive a topology update with itself removed from the consistent hash.

I have seen one failure because of this in {{StateTransferFunctionalTest.testInitialStateTransferAfterRestart}}:

{noformat}
03:25:36,749 TRACE (testng-StateTransferFunctionalTest:) [JGroupsTransport] dests=[NodeG-42396], command=CacheTopologyControlCommand{cache=nbst, type=LEAVE, sender=NodeH-44562, joinInfo=null, topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=1}, mode=ASYNCHRONOUS_WITH_SYNC_MARSHALLING, timeout=0
03:25:36,770 TRACE (testng-StateTransferFunctionalTest:) [JGroupsTransport] dests=[NodeG-42396], command=CacheTopologyControlCommand{cache=nbst, type=JOIN, sender=NodeH-44562, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.ReplicatedConsistentHashFactory at 335703e5, hashFunction=org.infinispan.commons.hash.MurmurHash3 at 64b6f0a5, numSegments=60, numOwners=2, timeout=240000}, topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=1}, mode=SYNCHRONOUS, timeout=240000
03:25:36,771 TRACE (OOB-1,ISPN,NodeG-42396:) [CommandAwareRpcDispatcher] Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=nbst, type=JOIN, sender=NodeH-44562, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.ReplicatedConsistentHashFactory at 3aea6b42, hashFunction=org.infinispan.commons.hash.MurmurHash3 at 7427d845, numSegments=60, numOwners=2, timeout=240000}, topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=1} [sender=NodeH-44562]
03:25:36,771 TRACE (testng-StateTransferFunctionalTest:) [StateTransferManagerImpl] Installing new cache topology CacheTopology{id=2, currentCH=ReplicatedConsistentHash{members=[NodeG-42396, NodeH-44562]}, pendingCH=null} on cache nbst
03:25:36,782 TRACE (OOB-2,ISPN,NodeG-42396:) [CommandAwareRpcDispatcher] Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=nbst, type=LEAVE, sender=NodeH-44562, joinInfo=null, topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=1} [sender=NodeH-44562]
03:25:36,840 TRACE (OOB-2,ISPN,NodeG-42396:nbst nbst) [StateTransferManagerImpl] Installing new cache topology CacheTopology{id=3, currentCH=ReplicatedConsistentHash{members=[NodeG-42396]}, pendingCH=null} on cache nbst
03:25:36,852 TRACE (OOB-2,ISPN,NodeH-44562:nbst) [StateTransferManagerImpl] Installing new cache topology CacheTopology{id=3, currentCH=ReplicatedConsistentHash{members=[NodeG-42396]}, pendingCH=null} on cache nbst
{noformat}

The solution is be to make the LEAVE command synchronous.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the infinispan-issues mailing list