[infinispan-dev] Eviction overhaul

Wed May 6 12:02:35 EDT 2009

Thanks to Jason for pointing out a flaw in my impl of the concurrent  
linking and unlinking.

This is fixed now with closer alignment to Sundell/Tsigas' algorithm,  
reintroducing a Marker entry and the like.  It's in SVN trunk, please  
review as feedback would be appreciated.  I'm also in the process of  
blogging about the impl in detail.

Cheers
Manik

On 14 Apr 2009, at 16:52, Manik Surtani wrote:

>
> On 14 Apr 2009, at 16:40, Manik Surtani wrote:
>
>> Guys
>>
>> Taking this a few steps further, just want to run the algo by a few  
>> more minds.  :-)
>>
>> This is based on ideas in "Lock Free Deques and Doubly Linked  
>> Lists" by H. Sundell and P. Tsigas [1], M. Michael's "High  
>> Performance Dynamic Lock-Free Hash Tables and List-Based Sets" [2],  
>> as well as some ideas from JDK6's ConcurrentSkipListMap impl.
>>
>> Basically I have heavily simplified and pared down the algorithms  
>> to suit the limited set of functionality as well as the fact that  
>> the linked list is backed by a concurrent hash map (implemented  
>> similar to the JDK's CHM, using lockable segments) for quick lookup.
>>
>> 1.  Entries need to be linked, so as to be traversed in order.   
>> Both FIFO and LRU will be supported.
>> 2.  Need to be doubly linked since direct lookup of a "previous"  
>> entry via hash function is not possible.
>> 3.  Only forward traversal is necessary.  The prev pointer is  
>> purely to get a hold of an entry behind the current one.
>> 4.  Needs to be concurrent and lock-free (with the exception of  
>> segment locks)
>>
>>
>> Here is the set of functionality that affects linking, that is  
>> needed, based on the existing DataContainer interface.
>>
>> 1.  adding a new entry.  (involves locking a segment, adding the  
>> entry to the segment, linking at end, unlocking)
>> 2.  removing an entry (lock segment, remove from segment, unlink,  
>> unlock)
>> 3.  update an existing entry (lock segment, update entry, unlink,  
>> link at end, unlock)
>> 4.  visit an entry (link at end)
>>
>> 3 and 4 are only relevant for the LRU container, not the FIFO  
>> container.
>>
>> So, as far as the links are concerned, we only need to support 2  
>> high-level functions: unlink and link_at_end.
>>
>> LinkedEntry {
>> 	LinkedEntry prev, next
>> 	InternalCacheEntry value
>> }
>>
>> unlink
>> ------
>> U1. CAS entry.prev.next to entry.next if entry.prev.next is entry
>> U2. CAS entry.next.prev to entry.prev if entry.next.prev is entry
>>
>> link at end
>> -----------
>> L1. entry.next = tail
>> L2. entry.prev = tail.prev
>> L3. CAS entry.prev.next to entry if entry.prev.next is tail.prev,  
>> else L2
>
> Sorry, that should read "... if entry.prev.next is tail ..."
>
>>
>> L4. tail.prev = this
>>
>>
>> Reading
>> -------
>> When reading an entry, if it is marked as deleted (a marked entry  
>> is identified by its value being set to null, which is an illegal  
>> state) it is unlinked (this "helper" operation can be performed by  
>> any thread regardless of function) and a null is returned as though  
>> the entry did not exist.
>>
>> Iterating
>> ---------
>> When iterating, if the next pointer reaches an entry that is marked  
>> as deleted, the entry is unlinked and the next entry is used.  It  
>> is important to note that iteration over prev pointers is not  
>> supported and may not work.  Prev pointers are only there to get a  
>> reference to an entry "somewhere" behind the current one.
>>
>> Markers
>> -------
>> Unlike CSLM, dummy marker nodes are not needed since null values  
>> are not supported (we always need an InternalCacheEntry to be  
>> present in a LinkedEntry) so CAS'ing this to null can be used to  
>> mark an entry as invalid/marked for deletion.
>>
>> Let me know what you think, the implementation should be  
>> significantly simpler and more efficient than what we have now.
>>
>> Cheers
>> Manik
>>
>> [1] http://www.md.chalmers.se/~tsigas/papers/Lock-Free-Deques-Doubly-Lists-JPDC.pdf 
>>  (2008)
>> [2] http://www.research.ibm.com/people/m/michael/spaa-2002.pdf (2002)
>>
>>
>> On 2 Apr 2009, at 16:31, Manik Surtani wrote:
>>
>>> Ok, taking this further.  After some discussion with Jason, there  
>>> actually is a better concurrent linking algorithm we can use that  
>>> is simpler and more efficient (no spin locks or auxillary nodes),  
>>> but a bit more delicate with node states.  THis is what is used in  
>>> the linking in JDK6's ConcurrentSkipListMap, but I've quite  
>>> heavily modified it for our purposes.  Not implemented this as  
>>> yet, but here are the ideas for you guys to tear apart.  Looking  
>>> forward to your feedback.  :-)
>>>
>>> Concurrent ordered data container (for FIFO, LRU containers)
>>> --------------------------------------------------------------------------------
>>>
>>> The main constraints on this container are that it should provide  
>>> constant time operations for get(), contains(), put() and remove()  
>>> operations, and the iterator should be ordered and offer constant  
>>> time traversal.
>>>
>>> This implementation is based on threadsafe linked lists used in  
>>> ConcurrentSkipListMap (CSLM) in Sun's JDK 6.  In addition, the  
>>> implementation uses a lockable Segment array (as in CHM) except  
>>> that in the HashEntries, keys are always Object and values are  
>>> LinkedEntries (LEs).  LEs always have a reference to an  
>>> InternalCacheEntry, as well as volatile next and prev pointers.
>>>
>>> Also, unlike CSLM Nodes, LEs are doubly linked since to update the  
>>> link on the predecessor, we need a reference to it (as LEs are  
>>> located via a hash lookup rather than traversal).  This alters the  
>>> logic of insertions and removals somewhat.
>>>
>>> Removal
>>> -------
>>> 1.  Lock segment containing key
>>> 2.  Locate LinkedEntry using a hash lookup
>>> 3.  If this is null, or its value is null, return null.
>>> 4.  CAS LE.value to null.  If this fails, ignore and return.   
>>> Someone else is probably removing it.
>>> 5.  Unlink.  If unlinking fails, ignore and continue; will be  
>>> marked and removed later.
>>> 6.  Release segment lock
>>>
>>> Unlink
>>> ------
>>> 1.  Create Marker LE (MLE)
>>> 2.  MLE.next = LE.next
>>> 3.  CAS LE.next to MLE (expecting MLE.next)
>>> 4.  CAS LE.prev.next to MLE.next (expecting LE)
>>> 5.  On failure to CAS, ignore
>>>
>>> Insertion
>>> ---------
>>>
>>> 1.  Lock segment containing key
>>> 2.  Create new LE, populate with value, insert into hash table
>>> 3.  Insert link
>>> 4.  Release segment lock
>>>
>>> Insert link
>>> -----------
>>> 1.  LE.next = HEAD
>>> 2.  LE.prev = HEAD.prev
>>> 3.  CAS HEAD.prev = LE (if HEAD.prev == LE.prev)
>>> 4.  If the CAS fails, start again at step 4
>>> 5.  LE.prev.next = LE (if LE.prev.next = HEAD)
>>>
>>>
>>> Update (E.g., for LRU visit or update)
>>> ------
>>> 1.  Lock segment containing key (only if update)
>>> 2.  if LE == null || LE.value == null return
>>> 3.  if update, update value.  No need for a CAS here since we have  
>>> a segment lock
>>> 4.  CAS LE.prev.next = LE.next (expecting LE, ignore failure)
>>> 5.  CAS LE.next.prev = LE.prev (expecting LE, ignore failure)
>>> 6.  insert LE
>>> 7.  Release segment lock (if update)
>>>
>>> Get
>>> ---
>>> 1.  Retrieve LE
>>> 2.  if (LE.value==null) unlink LE, return null.
>>>
>>> On 2 Apr 2009, at 10:00, Manik Surtani wrote:
>>>
>>>>
>>>> On 2 Apr 2009, at 09:10, Mircea Markus wrote:
>>>>
>>>>> Manik Surtani wrote:
>>>>>> Hello all.
>>>>>>
>>>>>> I have finished my work with the eviction code in Horizon, here  
>>>>>> is a summary of what has happened.
>>>>>>
>>>>>> From a user perspective (including API and configuration) as  
>>>>>> well as a design overview, please have a look at
>>>>>> http://www.jboss.org/community/docs/DOC-13449
>>>>>>
>>>>>> From an implementation perspective, have a look at the srcs of  
>>>>>> FIFODataContainer and LRUDataContainer.  These two classes are  
>>>>>> where everything happens.  The javadocs should explain the  
>>>>>> details, but in a nutshell you can expect constant time  
>>>>>> operations for all puts, gets, removes, iterations.  :-)
>>>>>> Feedback on the impls would be handy.  :-)
>>>>> I like to concept - it's nice and clear for users to grasp. Also  
>>>>> took a look at implementation, nice and complex, so I'd rather  
>>>>> have you and a white board to understand it completely :) .
>>>>> Some notes though: EvictionException is never used, same for  
>>>>> SpinLock.rl field.
>>>>
>>>> Good point, EvictionException was a carry-over from JBC which can  
>>>> go away.
>>>>
>>>> SpinLock.rl was something I used for debugging/tracking race  
>>>> conditions - it should go away now as well.
>>>>
>>>>> And a question:
>>>>> with this impl, we're doing locking twice for cache operations:  
>>>>> on LockInterceptor and on DataContainer itself. Why is this  
>>>>> necessary? I think this is needed so that eviction thread and  
>>>>> user threads not to conflict( am I right?).
>>>>
>>>> Well, the locking is done on a different level.  The  
>>>> LockingInterceptor locks a key.  So keep in mind that 2 threads  
>>>> may lock 2 separate keys concurrently, but these may map to the  
>>>> same hash bucket in the data container.  In which case, since the  
>>>> DC uses a CHM, this would mean that the same stripe is locked  
>>>> preventing concurrent reorganisation of the stripe.
>>>>
>>>>> Isn't it possible to use the same locking (i.e. LockManager) for  
>>>>> both these threads?
>>>>
>>>> Possible, but again not ideal, for example if you have TX 1 which  
>>>> writes to key K1.  The way this currently works, here is what  
>>>> happens:
>>>>
>>>> 1.  Acquire lock for key K1 in the lock manager
>>>> 2.  Write to K1
>>>> 3.  Sleep?
>>>> 4.  Commit:
>>>> 5.  Write changes to DC.  This will lock the stripe in the CHM  
>>>> for the duration of this step only.
>>>> 6.  Release lock acquired in 1
>>>>
>>>> Now if you have a concurrent write to K2, which *happens* to be  
>>>> in the same stripe as K1 in the CHM, this second thread will  
>>>> *not* block for TX 1 since TX 1 only locks the CHM stripe for a  
>>>> short duration (step 5 above).
>>>>
>>>> Now if we used the same locks thoughout (e.g., step 1 *is* the  
>>>> lock stripe in the CHM) then you have far less concurrency.
>>>>
>>>> Of course, the above premise only holds true if you have enough  
>>>> shared locks for the lock manager such that K1 and K2 do not map  
>>>> to the same lock, even if they are in the same stripe in the  
>>>> CHM.  Also holds true if lock striping is disabled and you have a  
>>>> lock per key.
>>>>
>>>>> (also, why are we using  ConcurrentHashMaps in default container  
>>>>> - isn't access to data guarded by the lock interceptor?
>>>>
>>>> See above.  :-)
>>>>
>>>> Cheers
>>>> Manik
>>>>
>>>> --
>>>> Manik Surtani
>>>> Lead, JBoss Cache
>>>> http://www.jbosscache.org
>>>> manik at jboss.org
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> horizon-dev mailing list
>>>> horizon-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/horizon-dev
>>>
>>> --
>>> Manik Surtani
>>> Lead, JBoss Cache
>>> http://www.jbosscache.org
>>> manik at jboss.org
>>>
>>>
>>>
>>>
>>
>> --
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org