[infinispan-dev] Fwd: CloudCacheStore Bug

Manik Surtani manik at jboss.org
Thu Feb 4 09:17:09 EST 2010


On 4 Feb 2010, at 14:06, Philippe Van Dyck wrote:

> Manik, it is is asynchronous, by default.
> 
> So I definitely think there is something rotten in the kingdom of Eviction.
> Since :
> 1) Eviction is fired in a new thread (I can disable this, no effect)
> 2) It removes entries from the datacontainer
> 3) While the datacontainer is 'maybe' asynchronously trying to write entries in the cachestore (using multiple threads!!)

^^ That's what I meant by eviction having no relationship to the synchronicity of the cache store.

> It makes a lot of threads, acting on the same data, at the same time... perfect for race conditions.

That's only if your cache store is async.

> Right now, I am loosing data - it is LOST (not written to the cachestore and not available in the datacontainer)
> 
> So I think that somehow, entries are evicted from the datacontainer... and the updates to the cachestore are lost somewhere.

I'm trying to think of how this can be.  Worker threads adding data, adding stuff to the async cache store queue for flushing.  The eviction thread removing stuff from the data container *only*.  

*Perhaps* what you see is a race where you have:

1 add item to data container
2 enqueue in async cache store for storage
3 evict in memory
4 attempt a get

where steps 1 - 4 happen *before* the async cache store can flush its queue to disk.  So this would result in the thread in 4 consulting the data container, not finding the entry, then checking the cache store and not finding it there either since it hasn't been flushed yet.  

Now IMO this is normal behaviour - the price you pay for asynchronously writing to a store.  But perhaps this window can be reduced by looking through the async queue as well, before checking the underlying store.  But as I said, this just reduces the size of this window and not eliminate it altogether, since this is async and there is no guarantee that the cache store has finished writing internally (e.g., an fsync() operation or in the case of S3, Amazon's eventual consistency model).

> I am digging a lot through the code but the good news is that it is very easy to reproduce, use a config like this (note the eviction stuff) :
> 
> <?xml version="1.0" encoding="UTF-8"?>
> 
> <infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> 	xmlns="urn:infinispan:config:4.0">
> 	<global>
> 		<transport
> 			transportClass="org.infinispan.remoting.transport.jgroups.JGroupsTransport">
> 			<properties>
> 				<property name="configurationFile" value="jgroups.xml" />
> 			</properties>
> 		</transport>
> 
> 	</global>
> 
> 	<namedCache name="qi4j">
> 		<transaction
> 			transactionManagerLookupClass="org.infinispan.transaction.lookup.DummyTransactionManagerLookup" />
> 		<clustering mode="distribution">
> 			<l1 enabled="true" lifespan="100000" />
> 			<hash numOwners="1" rehashRpcTimeout="120000" />
> 		</clustering>
> 
> 		<loaders passivation="false" shared="true" preload="false">
> 
> 			<loader class="org.infinispan.loaders.file.FileCacheStore"
> 				fetchPersistentState="false" ignoreModifications="false"
> 				purgeOnStartup="true" purgeSynchronously="true">
> 				<properties>
> 					<property name="location" value="/tmp" />
> 				</properties>
> 				<async enabled="true" threadPoolSize="1" />
> 			</loader>
> 
> 			</loaders>
> 		
> 		<deadlockDetection enabled="true" spinDuration="1000"></deadlockDetection>
> 
> 		<eviction strategy="FIFO" wakeUpInterval="1000" maxEntries="2" />
> 
> 		<unsafe unreliableReturnValues="true" />
> 
> 	</namedCache>
> </infinispan>
> 
> I don't know if it is related to transactions... I will now try to fire eviction manually, as a workaround.
> 
> 
> Something that bothers me is the lack of transactional eviction... is it difficult to make it transactional ? And then commit the whole transaction to the cachestore and after completion only, delete the entries from the datacontainer ??

Why should eviction be transactional?  I don't need eviction to be an all-or-nothing, reversible event. :)  If an entry gets evicted, cool.  If not (for whatever reason), too bad, move on to the next evictable entry.  

Cheers
Manik

> 
> Looks like a design issue ? WDYT ?
> 
> 
> Cheers,
> 
> Phil
> 
> 
> On Thu, Feb 4, 2010 at 10:44 AM, Manik Surtani <manik at jboss.org> wrote:
> That is strange since there is no correlation between eviction and the synchronicity of cache stores.  Have you got a reproducible test for this?
> 
> Cheers
> Manik
> 
> On 3 Feb 2010, at 18:37, Philippe Van Dyck wrote:
> 
>> Thanks Manik,
>> 
>> I have a another problem with eviction, it seems to destroy cache entries, only when I use async.
>> 
>> Of course, all updates are transactional.
>> 
>> Where should I search for clues ? Any idea ?
>> 
>> Here is my config:
>> 
>> <?xml version="1.0" encoding="UTF-8"?>
>> 
>> <infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> 	xmlns="urn:infinispan:config:4.0">
>> 	<global>
>> 		<transport
>> 			transportClass="org.infinispan.remoting.transport.jgroups.JGroupsTransport">
>> 			<properties>
>> 				<property name="configurationFile" value="jgroups.xml" />
>> 			</properties>
>> 		</transport>
>> 
>> 	</global>
>> 
>> 	<namedCache name="qi4j">
>> 		<transaction
>> 			transactionManagerLookupClass="org.infinispan.transaction.lookup.DummyTransactionManagerLookup" />
>> 		<clustering mode="distribution">
>> 			<l1 enabled="true" lifespan="100000" />
>> 			<hash numOwners="1" rehashRpcTimeout="120000" />
>> 		</clustering>
>> 
>> 		<loaders passivation="false" shared="true" preload="false">
>> 
>> 			<loader class="org.infinispan.loaders.file.FileCacheStore"
>> 				fetchPersistentState="false" ignoreModifications="false"
>> 				purgeOnStartup="true">
>> 				<properties>
>> 					<property name="location" value="/tmp" />
>> 				</properties>
>> 				<async enabled="true" threadPoolSize="3" />
>> 			</loader>
>> 
>> 			</loaders>
>> 		
>> 		<deadlockDetection enabled="true" spinDuration="1000"></deadlockDetection>
>> 
>> 		<eviction strategy="FIFO" wakeUpInterval="1000" maxEntries="10" />
>> 
>> 		<unsafe unreliableReturnValues="true" />
>> 
>> 	</namedCache>
>> </infinispan>
>> 
>> 
>> phil
>> 
>> 
>> 
>> On Wed, Feb 3, 2010 at 6:42 PM, Manik Surtani <manik at jboss.org> wrote:
>> Ugh, good point.  I thought the unit tests would have trapped a dumb-ass mistake like this.
>> 
>> The reason for transforming the name of the bucket is that we usually use hashcodes as the bucket name, which can take Integer.MIN_VALUE to Integer.MAX_VALUE.  These are then translated into Strings, and this becomes the name of the storage unit, e.g., 12345.bucket in the FileCacheStore.  Now filesystems are happy to accept a -12345.bucket but certain cloud storage providers barf when encountering the '-' character.  Hence the transformation to A12345.bucket in some cases.
>> 
>> Cheers
>> Manik
>> 
>> PS: pushing up a new snapshot as I type, containing this fix + lower verbosity on eviction-related lock timeouts.
>> 
>> On 3 Feb 2010, at 17:16, Philippe Van Dyck wrote:
>> 
>>> And BTW, why do it ?
>>> 
>>> p
>>> 
>>> ---------- Forwarded message ----------
>>> From: Philippe Van Dyck <pvdyck at gmail.com>
>>> Date: Wed, Feb 3, 2010 at 6:15 PM
>>> Subject: CloudCacheStore Bug
>>> To: infinispan -Dev List <infinispan-dev at lists.jboss.org>
>>> 
>>> 
>>> Hi all,
>>> 
>>> there is a bug in CloudCacheStore that makes me feel like I am the only one using it ;-)
>>> 
>>> in CR4 : if you change the "-" sign to "A" in getBucketName ... you need to do the opposite somewhere (or call it every time) ;-)
>>> 
>>> WDYT ?
>>> 
>>> p
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> 
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> --
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20100204/536fe1a0/attachment-0002.html 


More information about the infinispan-dev mailing list