[infinispan-dev] Eviction overhaul
Manik Surtani
manik at jboss.org
Tue Apr 14 11:52:09 EDT 2009
On 14 Apr 2009, at 16:40, Manik Surtani wrote:
> Guys
>
> Taking this a few steps further, just want to run the algo by a few
> more minds. :-)
>
> This is based on ideas in "Lock Free Deques and Doubly Linked Lists"
> by H. Sundell and P. Tsigas [1], M. Michael's "High Performance
> Dynamic Lock-Free Hash Tables and List-Based Sets" [2], as well as
> some ideas from JDK6's ConcurrentSkipListMap impl.
>
> Basically I have heavily simplified and pared down the algorithms to
> suit the limited set of functionality as well as the fact that the
> linked list is backed by a concurrent hash map (implemented similar
> to the JDK's CHM, using lockable segments) for quick lookup.
>
> 1. Entries need to be linked, so as to be traversed in order. Both
> FIFO and LRU will be supported.
> 2. Need to be doubly linked since direct lookup of a "previous"
> entry via hash function is not possible.
> 3. Only forward traversal is necessary. The prev pointer is purely
> to get a hold of an entry behind the current one.
> 4. Needs to be concurrent and lock-free (with the exception of
> segment locks)
>
>
> Here is the set of functionality that affects linking, that is
> needed, based on the existing DataContainer interface.
>
> 1. adding a new entry. (involves locking a segment, adding the
> entry to the segment, linking at end, unlocking)
> 2. removing an entry (lock segment, remove from segment, unlink,
> unlock)
> 3. update an existing entry (lock segment, update entry, unlink,
> link at end, unlock)
> 4. visit an entry (link at end)
>
> 3 and 4 are only relevant for the LRU container, not the FIFO
> container.
>
> So, as far as the links are concerned, we only need to support 2
> high-level functions: unlink and link_at_end.
>
> LinkedEntry {
> LinkedEntry prev, next
> InternalCacheEntry value
> }
>
> unlink
> ------
> U1. CAS entry.prev.next to entry.next if entry.prev.next is entry
> U2. CAS entry.next.prev to entry.prev if entry.next.prev is entry
>
> link at end
> -----------
> L1. entry.next = tail
> L2. entry.prev = tail.prev
> L3. CAS entry.prev.next to entry if entry.prev.next is tail.prev,
> else L2
Sorry, that should read "... if entry.prev.next is tail ..."
>
> L4. tail.prev = this
>
>
> Reading
> -------
> When reading an entry, if it is marked as deleted (a marked entry is
> identified by its value being set to null, which is an illegal
> state) it is unlinked (this "helper" operation can be performed by
> any thread regardless of function) and a null is returned as though
> the entry did not exist.
>
> Iterating
> ---------
> When iterating, if the next pointer reaches an entry that is marked
> as deleted, the entry is unlinked and the next entry is used. It is
> important to note that iteration over prev pointers is not supported
> and may not work. Prev pointers are only there to get a reference
> to an entry "somewhere" behind the current one.
>
> Markers
> -------
> Unlike CSLM, dummy marker nodes are not needed since null values are
> not supported (we always need an InternalCacheEntry to be present in
> a LinkedEntry) so CAS'ing this to null can be used to mark an entry
> as invalid/marked for deletion.
>
> Let me know what you think, the implementation should be
> significantly simpler and more efficient than what we have now.
>
> Cheers
> Manik
>
> [1] http://www.md.chalmers.se/~tsigas/papers/Lock-Free-Deques-Doubly-Lists-JPDC.pdf
> (2008)
> [2] http://www.research.ibm.com/people/m/michael/spaa-2002.pdf (2002)
>
>
> On 2 Apr 2009, at 16:31, Manik Surtani wrote:
>
>> Ok, taking this further. After some discussion with Jason, there
>> actually is a better concurrent linking algorithm we can use that
>> is simpler and more efficient (no spin locks or auxillary nodes),
>> but a bit more delicate with node states. THis is what is used in
>> the linking in JDK6's ConcurrentSkipListMap, but I've quite heavily
>> modified it for our purposes. Not implemented this as yet, but
>> here are the ideas for you guys to tear apart. Looking forward to
>> your feedback. :-)
>>
>> Concurrent ordered data container (for FIFO, LRU containers)
>> --------------------------------------------------------------------------------
>>
>> The main constraints on this container are that it should provide
>> constant time operations for get(), contains(), put() and remove()
>> operations, and the iterator should be ordered and offer constant
>> time traversal.
>>
>> This implementation is based on threadsafe linked lists used in
>> ConcurrentSkipListMap (CSLM) in Sun's JDK 6. In addition, the
>> implementation uses a lockable Segment array (as in CHM) except
>> that in the HashEntries, keys are always Object and values are
>> LinkedEntries (LEs). LEs always have a reference to an
>> InternalCacheEntry, as well as volatile next and prev pointers.
>>
>> Also, unlike CSLM Nodes, LEs are doubly linked since to update the
>> link on the predecessor, we need a reference to it (as LEs are
>> located via a hash lookup rather than traversal). This alters the
>> logic of insertions and removals somewhat.
>>
>> Removal
>> -------
>> 1. Lock segment containing key
>> 2. Locate LinkedEntry using a hash lookup
>> 3. If this is null, or its value is null, return null.
>> 4. CAS LE.value to null. If this fails, ignore and return.
>> Someone else is probably removing it.
>> 5. Unlink. If unlinking fails, ignore and continue; will be
>> marked and removed later.
>> 6. Release segment lock
>>
>> Unlink
>> ------
>> 1. Create Marker LE (MLE)
>> 2. MLE.next = LE.next
>> 3. CAS LE.next to MLE (expecting MLE.next)
>> 4. CAS LE.prev.next to MLE.next (expecting LE)
>> 5. On failure to CAS, ignore
>>
>> Insertion
>> ---------
>>
>> 1. Lock segment containing key
>> 2. Create new LE, populate with value, insert into hash table
>> 3. Insert link
>> 4. Release segment lock
>>
>> Insert link
>> -----------
>> 1. LE.next = HEAD
>> 2. LE.prev = HEAD.prev
>> 3. CAS HEAD.prev = LE (if HEAD.prev == LE.prev)
>> 4. If the CAS fails, start again at step 4
>> 5. LE.prev.next = LE (if LE.prev.next = HEAD)
>>
>>
>> Update (E.g., for LRU visit or update)
>> ------
>> 1. Lock segment containing key (only if update)
>> 2. if LE == null || LE.value == null return
>> 3. if update, update value. No need for a CAS here since we have
>> a segment lock
>> 4. CAS LE.prev.next = LE.next (expecting LE, ignore failure)
>> 5. CAS LE.next.prev = LE.prev (expecting LE, ignore failure)
>> 6. insert LE
>> 7. Release segment lock (if update)
>>
>> Get
>> ---
>> 1. Retrieve LE
>> 2. if (LE.value==null) unlink LE, return null.
>>
>> On 2 Apr 2009, at 10:00, Manik Surtani wrote:
>>
>>>
>>> On 2 Apr 2009, at 09:10, Mircea Markus wrote:
>>>
>>>> Manik Surtani wrote:
>>>>> Hello all.
>>>>>
>>>>> I have finished my work with the eviction code in Horizon, here
>>>>> is a summary of what has happened.
>>>>>
>>>>> From a user perspective (including API and configuration) as
>>>>> well as a design overview, please have a look at
>>>>> http://www.jboss.org/community/docs/DOC-13449
>>>>>
>>>>> From an implementation perspective, have a look at the srcs of
>>>>> FIFODataContainer and LRUDataContainer. These two classes are
>>>>> where everything happens. The javadocs should explain the
>>>>> details, but in a nutshell you can expect constant time
>>>>> operations for all puts, gets, removes, iterations. :-)
>>>>> Feedback on the impls would be handy. :-)
>>>> I like to concept - it's nice and clear for users to grasp. Also
>>>> took a look at implementation, nice and complex, so I'd rather
>>>> have you and a white board to understand it completely :) .
>>>> Some notes though: EvictionException is never used, same for
>>>> SpinLock.rl field.
>>>
>>> Good point, EvictionException was a carry-over from JBC which can
>>> go away.
>>>
>>> SpinLock.rl was something I used for debugging/tracking race
>>> conditions - it should go away now as well.
>>>
>>>> And a question:
>>>> with this impl, we're doing locking twice for cache operations:
>>>> on LockInterceptor and on DataContainer itself. Why is this
>>>> necessary? I think this is needed so that eviction thread and
>>>> user threads not to conflict( am I right?).
>>>
>>> Well, the locking is done on a different level. The
>>> LockingInterceptor locks a key. So keep in mind that 2 threads
>>> may lock 2 separate keys concurrently, but these may map to the
>>> same hash bucket in the data container. In which case, since the
>>> DC uses a CHM, this would mean that the same stripe is locked
>>> preventing concurrent reorganisation of the stripe.
>>>
>>>> Isn't it possible to use the same locking (i.e. LockManager) for
>>>> both these threads?
>>>
>>> Possible, but again not ideal, for example if you have TX 1 which
>>> writes to key K1. The way this currently works, here is what
>>> happens:
>>>
>>> 1. Acquire lock for key K1 in the lock manager
>>> 2. Write to K1
>>> 3. Sleep?
>>> 4. Commit:
>>> 5. Write changes to DC. This will lock the stripe in the CHM for
>>> the duration of this step only.
>>> 6. Release lock acquired in 1
>>>
>>> Now if you have a concurrent write to K2, which *happens* to be in
>>> the same stripe as K1 in the CHM, this second thread will *not*
>>> block for TX 1 since TX 1 only locks the CHM stripe for a short
>>> duration (step 5 above).
>>>
>>> Now if we used the same locks thoughout (e.g., step 1 *is* the
>>> lock stripe in the CHM) then you have far less concurrency.
>>>
>>> Of course, the above premise only holds true if you have enough
>>> shared locks for the lock manager such that K1 and K2 do not map
>>> to the same lock, even if they are in the same stripe in the CHM.
>>> Also holds true if lock striping is disabled and you have a lock
>>> per key.
>>>
>>>> (also, why are we using ConcurrentHashMaps in default container
>>>> - isn't access to data guarded by the lock interceptor?
>>>
>>> See above. :-)
>>>
>>> Cheers
>>> Manik
>>>
>>> --
>>> Manik Surtani
>>> Lead, JBoss Cache
>>> http://www.jbosscache.org
>>> manik at jboss.org
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> horizon-dev mailing list
>>> horizon-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/horizon-dev
>>
>> --
>> Manik Surtani
>> Lead, JBoss Cache
>> http://www.jbosscache.org
>> manik at jboss.org
>>
>>
>>
>>
>
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
More information about the infinispan-dev
mailing list