[infinispan-dev] Eviction overhaul

Manik Surtani manik at jboss.org
Tue Apr 14 11:52:09 EDT 2009


On 14 Apr 2009, at 16:40, Manik Surtani wrote:

> Guys
>
> Taking this a few steps further, just want to run the algo by a few  
> more minds.  :-)
>
> This is based on ideas in "Lock Free Deques and Doubly Linked Lists"  
> by H. Sundell and P. Tsigas [1], M. Michael's "High Performance  
> Dynamic Lock-Free Hash Tables and List-Based Sets" [2], as well as  
> some ideas from JDK6's ConcurrentSkipListMap impl.
>
> Basically I have heavily simplified and pared down the algorithms to  
> suit the limited set of functionality as well as the fact that the  
> linked list is backed by a concurrent hash map (implemented similar  
> to the JDK's CHM, using lockable segments) for quick lookup.
>
> 1.  Entries need to be linked, so as to be traversed in order.  Both  
> FIFO and LRU will be supported.
> 2.  Need to be doubly linked since direct lookup of a "previous"  
> entry via hash function is not possible.
> 3.  Only forward traversal is necessary.  The prev pointer is purely  
> to get a hold of an entry behind the current one.
> 4.  Needs to be concurrent and lock-free (with the exception of  
> segment locks)
>
>
> Here is the set of functionality that affects linking, that is  
> needed, based on the existing DataContainer interface.
>
> 1.  adding a new entry.  (involves locking a segment, adding the  
> entry to the segment, linking at end, unlocking)
> 2.  removing an entry (lock segment, remove from segment, unlink,  
> unlock)
> 3.  update an existing entry (lock segment, update entry, unlink,  
> link at end, unlock)
> 4.  visit an entry (link at end)
>
> 3 and 4 are only relevant for the LRU container, not the FIFO  
> container.
>
> So, as far as the links are concerned, we only need to support 2  
> high-level functions: unlink and link_at_end.
>
> LinkedEntry {
> 	LinkedEntry prev, next
> 	InternalCacheEntry value
> }
>
> unlink
> ------
> U1. CAS entry.prev.next to entry.next if entry.prev.next is entry
> U2. CAS entry.next.prev to entry.prev if entry.next.prev is entry
>
> link at end
> -----------
> L1. entry.next = tail
> L2. entry.prev = tail.prev
> L3. CAS entry.prev.next to entry if entry.prev.next is tail.prev,  
> else L2

Sorry, that should read "... if entry.prev.next is tail ..."

>
> L4. tail.prev = this
>
>
> Reading
> -------
> When reading an entry, if it is marked as deleted (a marked entry is  
> identified by its value being set to null, which is an illegal  
> state) it is unlinked (this "helper" operation can be performed by  
> any thread regardless of function) and a null is returned as though  
> the entry did not exist.
>
> Iterating
> ---------
> When iterating, if the next pointer reaches an entry that is marked  
> as deleted, the entry is unlinked and the next entry is used.  It is  
> important to note that iteration over prev pointers is not supported  
> and may not work.  Prev pointers are only there to get a reference  
> to an entry "somewhere" behind the current one.
>
> Markers
> -------
> Unlike CSLM, dummy marker nodes are not needed since null values are  
> not supported (we always need an InternalCacheEntry to be present in  
> a LinkedEntry) so CAS'ing this to null can be used to mark an entry  
> as invalid/marked for deletion.
>
> Let me know what you think, the implementation should be  
> significantly simpler and more efficient than what we have now.
>
> Cheers
> Manik
>
> [1] http://www.md.chalmers.se/~tsigas/papers/Lock-Free-Deques-Doubly-Lists-JPDC.pdf 
>  (2008)
> [2] http://www.research.ibm.com/people/m/michael/spaa-2002.pdf (2002)
>
>
> On 2 Apr 2009, at 16:31, Manik Surtani wrote:
>
>> Ok, taking this further.  After some discussion with Jason, there  
>> actually is a better concurrent linking algorithm we can use that  
>> is simpler and more efficient (no spin locks or auxillary nodes),  
>> but a bit more delicate with node states.  THis is what is used in  
>> the linking in JDK6's ConcurrentSkipListMap, but I've quite heavily  
>> modified it for our purposes.  Not implemented this as yet, but  
>> here are the ideas for you guys to tear apart.  Looking forward to  
>> your feedback.  :-)
>>
>> Concurrent ordered data container (for FIFO, LRU containers)
>> --------------------------------------------------------------------------------
>>
>> The main constraints on this container are that it should provide  
>> constant time operations for get(), contains(), put() and remove()  
>> operations, and the iterator should be ordered and offer constant  
>> time traversal.
>>
>> This implementation is based on threadsafe linked lists used in  
>> ConcurrentSkipListMap (CSLM) in Sun's JDK 6.  In addition, the  
>> implementation uses a lockable Segment array (as in CHM) except  
>> that in the HashEntries, keys are always Object and values are  
>> LinkedEntries (LEs).  LEs always have a reference to an  
>> InternalCacheEntry, as well as volatile next and prev pointers.
>>
>> Also, unlike CSLM Nodes, LEs are doubly linked since to update the  
>> link on the predecessor, we need a reference to it (as LEs are  
>> located via a hash lookup rather than traversal).  This alters the  
>> logic of insertions and removals somewhat.
>>
>> Removal
>> -------
>> 1.  Lock segment containing key
>> 2.  Locate LinkedEntry using a hash lookup
>> 3.  If this is null, or its value is null, return null.
>> 4.  CAS LE.value to null.  If this fails, ignore and return.   
>> Someone else is probably removing it.
>> 5.  Unlink.  If unlinking fails, ignore and continue; will be  
>> marked and removed later.
>> 6.  Release segment lock
>>
>> Unlink
>> ------
>> 1.  Create Marker LE (MLE)
>> 2.  MLE.next = LE.next
>> 3.  CAS LE.next to MLE (expecting MLE.next)
>> 4.  CAS LE.prev.next to MLE.next (expecting LE)
>> 5.  On failure to CAS, ignore
>>
>> Insertion
>> ---------
>>
>> 1.  Lock segment containing key
>> 2.  Create new LE, populate with value, insert into hash table
>> 3.  Insert link
>> 4.  Release segment lock
>>
>> Insert link
>> -----------
>> 1.  LE.next = HEAD
>> 2.  LE.prev = HEAD.prev
>> 3.  CAS HEAD.prev = LE (if HEAD.prev == LE.prev)
>> 4.  If the CAS fails, start again at step 4
>> 5.  LE.prev.next = LE (if LE.prev.next = HEAD)
>>
>>
>> Update (E.g., for LRU visit or update)
>> ------
>> 1.  Lock segment containing key (only if update)
>> 2.  if LE == null || LE.value == null return
>> 3.  if update, update value.  No need for a CAS here since we have  
>> a segment lock
>> 4.  CAS LE.prev.next = LE.next (expecting LE, ignore failure)
>> 5.  CAS LE.next.prev = LE.prev (expecting LE, ignore failure)
>> 6.  insert LE
>> 7.  Release segment lock (if update)
>>
>> Get
>> ---
>> 1.  Retrieve LE
>> 2.  if (LE.value==null) unlink LE, return null.
>>
>> On 2 Apr 2009, at 10:00, Manik Surtani wrote:
>>
>>>
>>> On 2 Apr 2009, at 09:10, Mircea Markus wrote:
>>>
>>>> Manik Surtani wrote:
>>>>> Hello all.
>>>>>
>>>>> I have finished my work with the eviction code in Horizon, here  
>>>>> is a summary of what has happened.
>>>>>
>>>>> From a user perspective (including API and configuration) as  
>>>>> well as a design overview, please have a look at
>>>>> http://www.jboss.org/community/docs/DOC-13449
>>>>>
>>>>> From an implementation perspective, have a look at the srcs of  
>>>>> FIFODataContainer and LRUDataContainer.  These two classes are  
>>>>> where everything happens.  The javadocs should explain the  
>>>>> details, but in a nutshell you can expect constant time  
>>>>> operations for all puts, gets, removes, iterations.  :-)
>>>>> Feedback on the impls would be handy.  :-)
>>>> I like to concept - it's nice and clear for users to grasp. Also  
>>>> took a look at implementation, nice and complex, so I'd rather  
>>>> have you and a white board to understand it completely :) .
>>>> Some notes though: EvictionException is never used, same for  
>>>> SpinLock.rl field.
>>>
>>> Good point, EvictionException was a carry-over from JBC which can  
>>> go away.
>>>
>>> SpinLock.rl was something I used for debugging/tracking race  
>>> conditions - it should go away now as well.
>>>
>>>> And a question:
>>>> with this impl, we're doing locking twice for cache operations:  
>>>> on LockInterceptor and on DataContainer itself. Why is this  
>>>> necessary? I think this is needed so that eviction thread and  
>>>> user threads not to conflict( am I right?).
>>>
>>> Well, the locking is done on a different level.  The  
>>> LockingInterceptor locks a key.  So keep in mind that 2 threads  
>>> may lock 2 separate keys concurrently, but these may map to the  
>>> same hash bucket in the data container.  In which case, since the  
>>> DC uses a CHM, this would mean that the same stripe is locked  
>>> preventing concurrent reorganisation of the stripe.
>>>
>>>> Isn't it possible to use the same locking (i.e. LockManager) for  
>>>> both these threads?
>>>
>>> Possible, but again not ideal, for example if you have TX 1 which  
>>> writes to key K1.  The way this currently works, here is what  
>>> happens:
>>>
>>> 1.  Acquire lock for key K1 in the lock manager
>>> 2.  Write to K1
>>> 3.  Sleep?
>>> 4.  Commit:
>>> 5.  Write changes to DC.  This will lock the stripe in the CHM for  
>>> the duration of this step only.
>>> 6.  Release lock acquired in 1
>>>
>>> Now if you have a concurrent write to K2, which *happens* to be in  
>>> the same stripe as K1 in the CHM, this second thread will *not*  
>>> block for TX 1 since TX 1 only locks the CHM stripe for a short  
>>> duration (step 5 above).
>>>
>>> Now if we used the same locks thoughout (e.g., step 1 *is* the  
>>> lock stripe in the CHM) then you have far less concurrency.
>>>
>>> Of course, the above premise only holds true if you have enough  
>>> shared locks for the lock manager such that K1 and K2 do not map  
>>> to the same lock, even if they are in the same stripe in the CHM.   
>>> Also holds true if lock striping is disabled and you have a lock  
>>> per key.
>>>
>>>> (also, why are we using  ConcurrentHashMaps in default container  
>>>> - isn't access to data guarded by the lock interceptor?
>>>
>>> See above.  :-)
>>>
>>> Cheers
>>> Manik
>>>
>>> --
>>> Manik Surtani
>>> Lead, JBoss Cache
>>> http://www.jbosscache.org
>>> manik at jboss.org
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> horizon-dev mailing list
>>> horizon-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/horizon-dev
>>
>> --
>> Manik Surtani
>> Lead, JBoss Cache
>> http://www.jbosscache.org
>> manik at jboss.org
>>
>>
>>
>>
>
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org







More information about the infinispan-dev mailing list