[
https://issues.jboss.org/browse/ISPN-925?page=com.atlassian.jira.plugin.s...
]
Mircea Markus commented on ISPN-925:
------------------------------------
After fixing this race condition and running the test another intermittent the
intermittent failure doesn't go away.
Another race condition was spotted, here is the cause of it:
- DistributionInterceptor.visitGetKeyValueCommand determines
1. boolean isRehashInProgress = !dm.isJoinComplete() || dm.isRehashInProgress();
- this method relies on the the list of leavers from dmImpl(see
dmImpl.isRehashInProgress), which at this point is empty, so it returns null.
2. the list of leavers is updated and a new hash ch2 is installed
3. at this point ch2 knows that k is mapped to local node. But as
isRehashInProgress==false (calculated at step 1) it doesn't go remote.
The problem here is that isRehashInProgress should be calculated at the same time with the
rehash function calculating step2.
race condition in DistributionManagerImpl
-----------------------------------------
Key: ISPN-925
URL:
https://issues.jboss.org/browse/ISPN-925
Project: Infinispan
Issue Type: Bug
Components: Core API, State transfer
Affects Versions: 4.2.0.Final
Reporter: Mircea Markus
Assignee: Manik Surtani
Priority: Critical
Fix For: 4.2.1.Final, 5.0.0.ALPHA3, 5.0.0.Final
This is causing StateTransferLargeObjectTest to fail intermittently (about 1/500 runs).
Nasty race condition in DistributionManagerImpl
1. if a node leaves then the new consistent hash is first set ( consistentHash =
ConsistentHashHelper.removeAddress(consistentHash, leaver, configuration, topologyInfo);)
2. then an InvertedLeaveTask is triggered if needed
3 this would add the leaver to DMI.levers and set the rehashInProgress flag of DMImpl to
true (RehashTask.call)
Now if a get call happens between 1 and 3 then then the the system would not go remotley.
The "go remotly if there's a rehash going on" condition happens in
DistInterceptor.visitGetKeyValueCommand:
boolean isRehashInProgress = !dm.isJoinComplete() || dm.isRehashInProgress();
if the isRehashInProgress is set to true then we go remotly even if the key is mapped to
the local node. Nasty!
--
This message is automatically generated by JIRA.
For more information on JIRA, see:
http://www.atlassian.com/software/jira