Dan Berindei created ISPN-2778:
----------------------------------
Summary: When a cache is restarted, the LEAVE and JOIN commands are not
ordered
Key: ISPN-2778
URL:
https://issues.jboss.org/browse/ISPN-2778
Project: Infinispan
Issue Type: Bug
Components: State transfer
Affects Versions: 5.2.0.CR3
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 5.2.0.Final
The LEAVE command is sent asynchronously, so if the cache is restarted it is possible for
the new JOIN command to be processed before the LEAVE command on the coordinator.
This doesn't work out very well: as the joining node is already present in the
consistent hash during join, it won't do any state transfer. After that, it will
receive a topology update with itself removed from the consistent hash.
I have seen one failure because of this in
{{StateTransferFunctionalTest.testInitialStateTransferAfterRestart}}:
{noformat}
03:25:36,749 TRACE (testng-StateTransferFunctionalTest:) [JGroupsTransport]
dests=[NodeG-42396], command=CacheTopologyControlCommand{cache=nbst, type=LEAVE,
sender=NodeH-44562, joinInfo=null, topologyId=0, currentCH=null, pendingCH=null,
throwable=null, viewId=1}, mode=ASYNCHRONOUS_WITH_SYNC_MARSHALLING, timeout=0
03:25:36,770 TRACE (testng-StateTransferFunctionalTest:) [JGroupsTransport]
dests=[NodeG-42396], command=CacheTopologyControlCommand{cache=nbst, type=JOIN,
sender=NodeH-44562,
joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.ReplicatedConsistentHashFactory@335703e5,
hashFunction=org.infinispan.commons.hash.MurmurHash3@64b6f0a5, numSegments=60,
numOwners=2, timeout=240000}, topologyId=0, currentCH=null, pendingCH=null,
throwable=null, viewId=1}, mode=SYNCHRONOUS, timeout=240000
03:25:36,771 TRACE (OOB-1,ISPN,NodeG-42396:) [CommandAwareRpcDispatcher] Attempting to
execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=nbst, type=JOIN,
sender=NodeH-44562,
joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.ReplicatedConsistentHashFactory@3aea6b42,
hashFunction=org.infinispan.commons.hash.MurmurHash3@7427d845, numSegments=60,
numOwners=2, timeout=240000}, topologyId=0, currentCH=null, pendingCH=null,
throwable=null, viewId=1} [sender=NodeH-44562]
03:25:36,771 TRACE (testng-StateTransferFunctionalTest:) [StateTransferManagerImpl]
Installing new cache topology CacheTopology{id=2,
currentCH=ReplicatedConsistentHash{members=[NodeG-42396, NodeH-44562]}, pendingCH=null} on
cache nbst
03:25:36,782 TRACE (OOB-2,ISPN,NodeG-42396:) [CommandAwareRpcDispatcher] Attempting to
execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=nbst, type=LEAVE,
sender=NodeH-44562, joinInfo=null, topologyId=0, currentCH=null, pendingCH=null,
throwable=null, viewId=1} [sender=NodeH-44562]
03:25:36,840 TRACE (OOB-2,ISPN,NodeG-42396:nbst nbst) [StateTransferManagerImpl]
Installing new cache topology CacheTopology{id=3,
currentCH=ReplicatedConsistentHash{members=[NodeG-42396]}, pendingCH=null} on cache nbst
03:25:36,852 TRACE (OOB-2,ISPN,NodeH-44562:nbst) [StateTransferManagerImpl] Installing new
cache topology CacheTopology{id=3,
currentCH=ReplicatedConsistentHash{members=[NodeG-42396]}, pendingCH=null} on cache nbst
{noformat}
The solution is be to make the LEAVE command synchronous.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira