[infinispan-dev] Need help

Sanne Grinovero sanne at infinispan.org
Mon Oct 7 09:19:42 EDT 2013


On 7 October 2013 13:20, Pedro Ruivo <pedro at infinispan.org> wrote:
>
>
> On 10/07/2013 11:43 AM, Sanne Grinovero wrote:
>>
>>
>> Proposal:
>> let's be more aggressive on validation.
>>   - make preload mandatory (for the metadata only)
>>   - eviction not allowed (for the metadata only)
>>
>> Note that I don't think we're putting much of a restriction to users,
>> we're more likely helping
>> with good guidance on how these caches are supposed to work.
>> I think it's a good thing to narrow down the possible configuration
>> options
>> and make it clear what we expect people to use for this purpose.
>
>
> it's good enough for me. :)
>
>
>>
>>
>>>> 3## The CacheLoader loading the same entry multiple times in parallel
>>>> Kudos for finding out that there are situations in which we deal with
>>>> multiple different instances of ConcurrentHashSet! Still, I think that
>>>> Infinispan core is terribly wrong in this case:
>>>> from the client code POV a new CHS is created with a put-if-absent
>>>> atomic operation, and I will assume there that core will check/load
>>>> any enabled cachestore as well.
>>>> To handle multiple GET operations in parallel, or even in parallel
>>>> with preload or the client's put-if-absent operation, I would *not*
>>>> expect Infinispan core to storm the CacheStore implementation with
>>>> multiple load operations on the same put: a lock should be hold on the
>>>> potential key during such a load operation.
>>>
>>>
>>>
>>> I didn't understand what you mean here ^^. Infinispan only tries to load
>>> from a cache store once per operation (get, put, remove, etc...).
>>>
>>> however, I think we have a big windows in which multiple operations can
>>> load
>>> the same key from the store. This happens because we only write to the
>>> data
>>> container after the operation end.
>>
>>
>> I think that's a very important consideration.
>>
>> Imagine a REST cache is built on Infinispan, using a filesystem based
>> CacheStore, and we receive a million requests per second
>> for a key "X1", which doesn't exist.
>> Since each request is not going to find it in the data container, each
>> request is triggering a CacheStore load.
>> Do you want to enqueue a million IO operations to the disk?
>> I'd much rather have some locking in memory, that will prevent nut
>> just disks from struggling but also allow the CPUs to context switch
>> to perform more useful tasks: being blocked on a lock is a good thing
>> in this case, rather than allowing "parallel processing of pointless
>> requests". That's just parallel energy burning, and potentially DOSing
>> yourself regarding more interesting requests which could be handled
>> instead.
>>
>> To translate it into an implementation design, you need to lock the
>> "X1" key before attempting to load it from the CacheStore, and when
>> the lock is acquired the data container needs to be re-checked as it
>> might have been loaded by a parallel thread.
>> Or alternatively, lock the key before attempting to read from the data
>> container: that's not too bad, it's a very brief local only lock.
>
>
> you have good point here, but IMO this is a high risk operation because we
> are getting close in the final release. and I don't want to make the cache
> store unusable...

I understand there is risk, but this should have been done long ago.
AFAIR one of the goals for 6.0 is to improve performance of CacheStores,
this is one of the many changes going in that direction.
Besides, I wouldn't be too worried as the CacheStores API is entirely
new and most people will only provide feedback after the Final release,
so this is not an entirely unfortunate timing.

>
> but, let's see what others think about it.
>
> my 1 cent pseudo-code suggestion:
>
> lock(key) {
>   value = dataContainer.get(key)
>   if value is not null; then return value
>   loaded = cacheLoader.load(key)
>   if loaded is null; then return null
>   existing = dataContainer.putIfAbsent(key, loaded)
>   if existing is null; then return loaded; otherwise existing
>
> }

+1 for something similar. Not sure why you need a putIfAbsent since
you have the lock, but that's a detail.

An alternative with a lockfree fast-path:

value = dataContainer.get(key)
if (value!=null) return value
lock(key) {
    value = dataContainer.get(key)
    if value is not null; then return value
    loaded = cacheLoader.load(key)
    if loaded is null; then return null
    existing = dataContainer.put(key, loaded)
    return loaded
}

Wouldn't be much of a benefit for the example I made above until we
switch to using a placeholder token (tombstone) for null values,
but there is great value in this pattern for existing entries.

Sanne


More information about the infinispan-dev mailing list