[infinispan-issues] [JBoss JIRA] Commented: (ISPN-925) race condition in DistributionManagerImpl
Mircea Markus (JIRA)
jira-events at lists.jboss.org
Mon Feb 14 08:00:14 EST 2011
[ https://issues.jboss.org/browse/ISPN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582010#comment-12582010 ]
Mircea Markus commented on ISPN-925:
------------------------------------
After fixing this race condition and running the test another intermittent the intermittent failure doesn't go away.
Another race condition was spotted, here is the cause of it:
- DistributionInterceptor.visitGetKeyValueCommand determines
1. boolean isRehashInProgress = !dm.isJoinComplete() || dm.isRehashInProgress();
- this method relies on the the list of leavers from dmImpl(see dmImpl.isRehashInProgress), which at this point is empty, so it returns null.
2. the list of leavers is updated and a new hash ch2 is installed
3. at this point ch2 knows that k is mapped to local node. But as isRehashInProgress==false (calculated at step 1) it doesn't go remote.
The problem here is that isRehashInProgress should be calculated at the same time with the rehash function calculating step2.
> race condition in DistributionManagerImpl
> -----------------------------------------
>
> Key: ISPN-925
> URL: https://issues.jboss.org/browse/ISPN-925
> Project: Infinispan
> Issue Type: Bug
> Components: Core API, State transfer
> Affects Versions: 4.2.0.Final
> Reporter: Mircea Markus
> Assignee: Manik Surtani
> Priority: Critical
> Fix For: 4.2.1.Final, 5.0.0.ALPHA3, 5.0.0.Final
>
>
> This is causing StateTransferLargeObjectTest to fail intermittently (about 1/500 runs).
> Nasty race condition in DistributionManagerImpl
> 1. if a node leaves then the new consistent hash is first set ( consistentHash = ConsistentHashHelper.removeAddress(consistentHash, leaver, configuration, topologyInfo);)
> 2. then an InvertedLeaveTask is triggered if needed
> 3 this would add the leaver to DMI.levers and set the rehashInProgress flag of DMImpl to true (RehashTask.call)
> Now if a get call happens between 1 and 3 then then the the system would not go remotley. The "go remotly if there's a rehash going on" condition happens in DistInterceptor.visitGetKeyValueCommand:
> boolean isRehashInProgress = !dm.isJoinComplete() || dm.isRehashInProgress();
> if the isRehashInProgress is set to true then we go remotly even if the key is mapped to the local node. Nasty!
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list