Ok, my suggestion below is working now, but I'm not so sure the way the activation
interceptor works is the correct one.
IMO, an activation, which involves removing an entry from the cache store (and updating
the stats), should only be done when a cache entry has been retrieved from the cache store
and put into the cache, so a special get() operation.
However, the activation interceptor is going beyond that by removing from the store when
put, replace, putAll, remove.
Remove makes sense. The entry might not be in memory and so you want to remove it from the
store.
But, put, pulAll and replace? These update the in-memory data, and don't really see
why they should remove anything from the store.
I mean, if they don't remove it from the store and the cache crashes, stale data would
be returned, unless you're a clustered cache and you'd retrieve the in-memory data
from another node. If they remove the cache store value and the cache crashes, the data is
gone, again unless it's a clustered cache. So why make an effort removing the entry
from store in these situations? It adds time in the critical path of these operations.
Thoughts?
On Oct 31, 2012, at 11:27 AM, Galder Zamarreño <galder(a)redhat.com> wrote:
On Oct 30, 2012, at 6:23 PM, Galder Zamarreño <galder(a)redhat.com> wrote:
> Hi all,
>
> Re:
https://issues.jboss.org/browse/ISPN-2384
>
> I've created a unit test in
https://github.com/galderz/infinispan/commit/01230d40df6f26720039986916c3...
>
> That was the easy part :). How to fix it is no so clear. In pseudo-code, the race
condition happens when:
>
> 1. T1. passivate entry X to cache store
> 2. T2. retrieve X from memory
> 3. T2. activation interceptor removes X from store
> 4. T1. evicts X from memory
>
> The end result is that X is gone from both memory and cache store.
>
> I've thought of several ways of fixing it, but not convinced with any:
>
> One way to fix this is by making step 1 & 4 atomic, and I was hoping to pigyback
on the segment lock, but for that to work, step 2 (data container get() op) would need to
wait for this segment lock, which would be detrimental for performance.
>
> Another way would be if activation only happened if the data was retrieved from the
cache store (and not from memory) since removing from cache store when the value came from
memory is rather pointless. The problem with this solution is that the source of the data
is not currently ship around in the interceptor stack. IOW, the activation interceptor
doesn't know if the data came from memory or the cache store. This could be
potentially recorded in the cache entry stored in the context, but requires some
refactoring and it could still be vunerable to this sequence of events:
>
> 1. T1. passivate entry X to cache store
> 2. T2. retrieve X from cache store
> 3. T2. store X in memory
> 4. T2. activation interceptor removes X from store
> 5. T1. evicts X from memory
>
> So, to get around this sequence of events, since the the store acquires a segment
lock, passivate and evict X could be done within the segment lock, and that would make
sure that the evict happens before the storage in memory:
>
> 1. T1. acquire segment lock
> 2. T1. passivate entry X to cache store
> 3. T2. retrieve X from cache store
> 4. T1. evicts X from memory
> 5. T1. releases segment lock
> 6. T2. acquires segment lock
> 7. T2. store X in memory
> 8. T2. activation interceptor removes X from store
^ Actually, don't think this won't work as is, because the activation interceptor
logic is outside the data container and hence no way to protect under the segment lock.
Another important point here is that the key being evicted is not the one for which the
per-key lock has been acquired, so passivating it is not managed by per key lock. Only
when is evicted from memory the segment lock protects it.
Moving the activation logic closed to the data ctaoner, to be more simmetric with the
passivation logic, is doable, but would bloat the BoundedConcurrentHashMap, and passing
information at this level on whether the data comes from the cache store (as opposed to
coming from another node, or being the result of a put) doesn't seem straightforward,
and remember this optimization would be important to minimise cache store removal
operations (compared to current constant cache store removals), and hence reduce the
duration while the segment lock is held.
> 9, a. T2. releases segment lock
>
> I think this could work, but I was wondering if you could see other potential
solutions?
>
> Cheers,
> --
> Galder Zamarreño
> galder(a)redhat.com
>
twitter.com/galderz
>
> Project Lead, Escalante
>
http://escalante.io
>
> Engineer, Infinispan
>
http://infinispan.org
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Galder Zamarreño
galder(a)redhat.com
twitter.com/galderz
Project Lead, Escalante
http://escalante.io
Engineer, Infinispan
http://infinispan.org
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Galder Zamarreño
galder(a)redhat.com
twitter.com/galderz
Project Lead, Escalante
http://escalante.io
Engineer, Infinispan
http://infinispan.org