[infinispan-dev] ISPN-699 - proper cancellation of cache store operations

Galder Zamarreño galder at redhat.com
Thu Oct 21 04:28:39 EDT 2010


On Oct 21, 2010, at 10:22 AM, Manik Surtani wrote:

> The correct vehicle for such work is the InterruptedException.  
> 
> So EvictionManager.stop() interrupting and cancelling the EvictionTask is correct behaviour.  What happens here?  The processEviction() method gets interrupted?  We just need to make sure we handle this properly.

It doesn't get interrupted properly at least in the case that FileCacheStore is in the middle of purging the cache store, which is what leads to NPE.

> 
> So the first bit in processEviction() is probably wrong:
> 
> http://fisheye.jboss.org/browse/Infinispan/branches/4.2.x/core/src/main/java/org/infinispan/eviction/EvictionManagerImpl.java?r=2525#l90
> 
> if startLatch.await() is interrupted, we shouldn't just continue with the method, but rather return.  It won't *prevent* the problem of the thread being interrupted while purging the cache store, but it will still prevent unnecessary purging from taking place.
> 
> Further, the block that purges the cache store is wrapped in a try-catch block.  Don't you see an interrupted exception here?

No idea what the startLatch is there for. Got rid of it cos it doesn't make much sense, specially, why do count down if eviction is not enabled?

But the problem is not there. I can protect purging container and purging store around isInterrupted calls once I get rid of the start latch. The actual problem is when interruption happens when the FCS is in the middle of purging. In the UT attached to the the JIRA, this can happen relatively easily cos it deals with quite a lot of data.

This is what I'm trying to solve right now, a way for the FCS to get our of purging internal properly if interrupted.

> 
> 
> 
> On 20 Oct 2010, at 14:13, Galder Zamarreño wrote:
> 
>> Hi,
>> 
>> Re: https://jira.jboss.org/browse/ISPN-699
>> 
>> I'm trying to figure out what the best way to solve this issue is. Basically, the problem is that when cache manager is stopped, EvictionManagerImpl cancels with interruption the evictionTask and I'm seeing issues with cacheStore.purgeExpired() not responding to cancellation properly. This results in Marshaller being stopped and then eviction thread trying to purge the cache store. Obviously, once the marshaller is stopped, nothing can be read any more.
>> 
>> I've tried to simply protect cacheStore.purgeExpired() call around a Thread.currentThread().isInterrupted() call but this is not enough because we could have hundreds of buckets to check for purging, and the interruption could happen while looping through them. Now, I don't see the point of plaguing the code with Thread.currentThread().isInterrupted() checks, it'd be pointless. Instead, I wanted to share other ideas to solve this issue:
>> 
>> 1. EvictionManagerImpl could wait for any ongoing eviction task to finished. This could be potentially lengthy if for example the cache store has hundreds or thousands of buckets, and we don't want for stop requests to block at all.
>> 
>> 2. The main problem comes from the fact that the marshaller is being requested to read something but it can't do it anymore since it's shutting down. An alternative would be for the ConstantObjectTable to return null under the situation that is stopped and the thread is interrupted. This might work fine since if the bucket read returns null, it skips the bucket, but it's not optimal. If you have 1000s of buckets, it is going to continue looping through them, so you'd need something else, and this logic would need to be replicated in all cache stores.
>> 
>> 3. COT returning null might not be a good idea. Instead, ObjectTable readObject() method could be changed to declare that it can throw an InterruptedException. This would force any caller to deal with this situation, including all current cache stores. I think this is the only way we can enforce cache stores to behave properly to interruption situations like the one mentioned above. Otherwise, we have to start doing esoteric things in all cache stores to check what a null return from unmarshalling means, or try to double guess that an IOException might be wrapping an IE. 
>> 
>> I'm currently leaning towards 3.
>> 
>> Thoughts? 
>> --
>> Galder Zamarreño
>> Sr. Software Engineer
>> Infinispan, JBoss Cache
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache




More information about the infinispan-dev mailing list