[infinispan-issues] [JBoss JIRA] Commented: (ISPN-925) race condition in DistributionManagerImpl

Mircea Markus (JIRA) jira-events at lists.jboss.org
Mon Feb 14 08:00:14 EST 2011


    [ https://issues.jboss.org/browse/ISPN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582010#comment-12582010 ] 

Mircea Markus commented on ISPN-925:
------------------------------------

After fixing this race condition and running the test another intermittent the intermittent failure doesn't go away.
Another race condition was spotted, here is the cause of it:
- DistributionInterceptor.visitGetKeyValueCommand determines 
1. boolean isRehashInProgress = !dm.isJoinComplete() || dm.isRehashInProgress(); 
- this method relies on the the list of leavers from dmImpl(see dmImpl.isRehashInProgress), which at this point is empty, so it returns null.
2. the list of leavers is updated and a new hash ch2 is installed
3. at this point ch2 knows that k is mapped to local node. But as isRehashInProgress==false (calculated at step 1) it doesn't go remote.

The problem here is that isRehashInProgress should be calculated at the same time with the rehash function calculating step2. 

> race condition in DistributionManagerImpl
> -----------------------------------------
>
>                 Key: ISPN-925
>                 URL: https://issues.jboss.org/browse/ISPN-925
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core API, State transfer
>    Affects Versions: 4.2.0.Final
>            Reporter: Mircea Markus
>            Assignee: Manik Surtani
>            Priority: Critical
>             Fix For: 4.2.1.Final, 5.0.0.ALPHA3, 5.0.0.Final
>
>
> This is causing StateTransferLargeObjectTest to fail intermittently (about 1/500 runs).
> Nasty race condition in DistributionManagerImpl
> 1. if a node leaves then the new consistent hash is first set ( consistentHash = ConsistentHashHelper.removeAddress(consistentHash, leaver, configuration, topologyInfo);)
> 2. then an InvertedLeaveTask is triggered if needed
> 3 this would add the leaver to  DMI.levers and set the rehashInProgress flag of DMImpl to true (RehashTask.call)
> Now if a get call happens between 1 and 3 then then the the system would not go remotley. The "go remotly if there's a rehash going on" condition happens in DistInterceptor.visitGetKeyValueCommand: 
> boolean isRehashInProgress = !dm.isJoinComplete() || dm.isRehashInProgress(); 
> if the isRehashInProgress is set to true then we go remotly even if the key is mapped to the local node. Nasty!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the infinispan-issues mailing list