[infinispan-dev] Need help

Mon Oct 7 10:15:34 EDT 2013

On 10/07/2013 02:19 PM, Sanne Grinovero wrote:
> On 7 October 2013 13:20, Pedro Ruivo <pedro at infinispan.org> wrote:
>>
>>
>> On 10/07/2013 11:43 AM, Sanne Grinovero wrote:
>>>
>>>
>>> Proposal:
>>> let's be more aggressive on validation.
>>>    - make preload mandatory (for the metadata only)
>>>    - eviction not allowed (for the metadata only)
>>>
>>> Note that I don't think we're putting much of a restriction to users,
>>> we're more likely helping
>>> with good guidance on how these caches are supposed to work.
>>> I think it's a good thing to narrow down the possible configuration
>>> options
>>> and make it clear what we expect people to use for this purpose.
>>
>>
>> it's good enough for me. :)
>>
>>
>>>
>>>
>>>>> 3## The CacheLoader loading the same entry multiple times in parallel
>>>>> Kudos for finding out that there are situations in which we deal with
>>>>> multiple different instances of ConcurrentHashSet! Still, I think that
>>>>> Infinispan core is terribly wrong in this case:
>>>>> from the client code POV a new CHS is created with a put-if-absent
>>>>> atomic operation, and I will assume there that core will check/load
>>>>> any enabled cachestore as well.
>>>>> To handle multiple GET operations in parallel, or even in parallel
>>>>> with preload or the client's put-if-absent operation, I would *not*
>>>>> expect Infinispan core to storm the CacheStore implementation with
>>>>> multiple load operations on the same put: a lock should be hold on the
>>>>> potential key during such a load operation.
>>>>
>>>>
>>>>
>>>> I didn't understand what you mean here ^^. Infinispan only tries to load
>>>> from a cache store once per operation (get, put, remove, etc...).
>>>>
>>>> however, I think we have a big windows in which multiple operations can
>>>> load
>>>> the same key from the store. This happens because we only write to the
>>>> data
>>>> container after the operation end.
>>>
>>>
>>> I think that's a very important consideration.
>>>
>>> Imagine a REST cache is built on Infinispan, using a filesystem based
>>> CacheStore, and we receive a million requests per second
>>> for a key "X1", which doesn't exist.
>>> Since each request is not going to find it in the data container, each
>>> request is triggering a CacheStore load.
>>> Do you want to enqueue a million IO operations to the disk?
>>> I'd much rather have some locking in memory, that will prevent nut
>>> just disks from struggling but also allow the CPUs to context switch
>>> to perform more useful tasks: being blocked on a lock is a good thing
>>> in this case, rather than allowing "parallel processing of pointless
>>> requests". That's just parallel energy burning, and potentially DOSing
>>> yourself regarding more interesting requests which could be handled
>>> instead.
>>>
>>> To translate it into an implementation design, you need to lock the
>>> "X1" key before attempting to load it from the CacheStore, and when
>>> the lock is acquired the data container needs to be re-checked as it
>>> might have been loaded by a parallel thread.
>>> Or alternatively, lock the key before attempting to read from the data
>>> container: that's not too bad, it's a very brief local only lock.
>>
>>
>> you have good point here, but IMO this is a high risk operation because we
>> are getting close in the final release. and I don't want to make the cache
>> store unusable...
>
> I understand there is risk, but this should have been done long ago.
> AFAIR one of the goals for 6.0 is to improve performance of CacheStores,
> this is one of the many changes going in that direction.
> Besides, I wouldn't be too worried as the CacheStores API is entirely
> new and most people will only provide feedback after the Final release,
> so this is not an entirely unfortunate timing.
>
>>
>> but, let's see what others think about it.
>>
>> my 1 cent pseudo-code suggestion:
>>
>> lock(key) {
>>    value = dataContainer.get(key)
>>    if value is not null; then return value
>>    loaded = cacheLoader.load(key)
>>    if loaded is null; then return null
>>    existing = dataContainer.putIfAbsent(key, loaded)
>>    if existing is null; then return loaded; otherwise existing
>>
>> }
>
> +1 for something similar. Not sure why you need a putIfAbsent since
> you have the lock, but that's a detail.

because I'm assuming that the lock(key) is an internal lock of the 
PersistenceManager and not a lock from LockManager. And, in this case, 
we can have a thread trying to update the same key and I want to avoid 
to put in data container and old value.

I would like to avoid the LockManager because in transactional caches, 
the application can do cache.getAdvancedCache().lock(key) and after 
that, no one can read that key again.

>
> An alternative with a lockfree fast-path:
>
> value = dataContainer.get(key)
> if (value!=null) return value
> lock(key) {
>      value = dataContainer.get(key)
>      if value is not null; then return value
>      loaded = cacheLoader.load(key)
>      if loaded is null; then return null
>      existing = dataContainer.put(key, loaded)
>      return loaded
> }

+1 we already check the data container before trying to load it :)

>
> Wouldn't be much of a benefit for the example I made above until we
> switch to using a placeholder token (tombstone) for null values,
> but there is great value in this pattern for existing entries.
>
> Sanne
>