[infinispan-issues] [JBoss JIRA] (ISPN-2778) When a cache is restarted, the LEAVE and JOIN commands are not ordered
Dan Berindei (JIRA)
jira-events at lists.jboss.org
Thu Jan 31 03:28:51 EST 2013
Dan Berindei created ISPN-2778:
----------------------------------
Summary: When a cache is restarted, the LEAVE and JOIN commands are not ordered
Key: ISPN-2778
URL: https://issues.jboss.org/browse/ISPN-2778
Project: Infinispan
Issue Type: Bug
Components: State transfer
Affects Versions: 5.2.0.CR3
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 5.2.0.Final
The LEAVE command is sent asynchronously, so if the cache is restarted it is possible for the new JOIN command to be processed before the LEAVE command on the coordinator.
This doesn't work out very well: as the joining node is already present in the consistent hash during join, it won't do any state transfer. After that, it will receive a topology update with itself removed from the consistent hash.
I have seen one failure because of this in {{StateTransferFunctionalTest.testInitialStateTransferAfterRestart}}:
{noformat}
03:25:36,749 TRACE (testng-StateTransferFunctionalTest:) [JGroupsTransport] dests=[NodeG-42396], command=CacheTopologyControlCommand{cache=nbst, type=LEAVE, sender=NodeH-44562, joinInfo=null, topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=1}, mode=ASYNCHRONOUS_WITH_SYNC_MARSHALLING, timeout=0
03:25:36,770 TRACE (testng-StateTransferFunctionalTest:) [JGroupsTransport] dests=[NodeG-42396], command=CacheTopologyControlCommand{cache=nbst, type=JOIN, sender=NodeH-44562, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.ReplicatedConsistentHashFactory at 335703e5, hashFunction=org.infinispan.commons.hash.MurmurHash3 at 64b6f0a5, numSegments=60, numOwners=2, timeout=240000}, topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=1}, mode=SYNCHRONOUS, timeout=240000
03:25:36,771 TRACE (OOB-1,ISPN,NodeG-42396:) [CommandAwareRpcDispatcher] Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=nbst, type=JOIN, sender=NodeH-44562, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.ReplicatedConsistentHashFactory at 3aea6b42, hashFunction=org.infinispan.commons.hash.MurmurHash3 at 7427d845, numSegments=60, numOwners=2, timeout=240000}, topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=1} [sender=NodeH-44562]
03:25:36,771 TRACE (testng-StateTransferFunctionalTest:) [StateTransferManagerImpl] Installing new cache topology CacheTopology{id=2, currentCH=ReplicatedConsistentHash{members=[NodeG-42396, NodeH-44562]}, pendingCH=null} on cache nbst
03:25:36,782 TRACE (OOB-2,ISPN,NodeG-42396:) [CommandAwareRpcDispatcher] Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=nbst, type=LEAVE, sender=NodeH-44562, joinInfo=null, topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=1} [sender=NodeH-44562]
03:25:36,840 TRACE (OOB-2,ISPN,NodeG-42396:nbst nbst) [StateTransferManagerImpl] Installing new cache topology CacheTopology{id=3, currentCH=ReplicatedConsistentHash{members=[NodeG-42396]}, pendingCH=null} on cache nbst
03:25:36,852 TRACE (OOB-2,ISPN,NodeH-44562:nbst) [StateTransferManagerImpl] Installing new cache topology CacheTopology{id=3, currentCH=ReplicatedConsistentHash{members=[NodeG-42396]}, pendingCH=null} on cache nbst
{noformat}
The solution is be to make the LEAVE command synchronous.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list