[infinispan-dev] ClusteredGet and direct operations on a cacheloader

Wed Apr 22 09:48:08 EDT 2009

On 22 Apr 2009, at 14:30, Mircea Markus wrote:

> Manik Surtani wrote:
>> Mircea
>>
>> Regarding direct loading from a cache loader in the  
>> ClusteredGetCommand, this is an issue since
>
>> a) it does not acquire any locks to put this value back in the data  
>> container, and
> cache loaders are read only and stores use bucket locking, it's a  
> different locking level though...

I was referring to cache-level locking.  E.g., doing a load from a  
loader and then a put into the data container without any locking from  
the LockManager is a big no-no.  :-)

>> b) probably as a consequence of a), doesn't bother to put any  
>> looked up value in the data container at all.
>>
>> This is inefficient since multiple CGCs on the same key (coming  
>> from different cache instances in a cluster) would cause the owning  
>> cache to load this several times repeatedly from a loader.
> At the same time, unique calls for different keys would increase the  
> memory footprint... though I agree with your point of view overall:  
> after all remote cache can be considered just another cache user.
>>
>> I see this problem with DIST since I use CGC to load an entry from  
>> a remote cache and if L1 caching is disabled on the requesting  
>> cache, multiple uses of an entry will result in this entry being  
>> read off a loader repeatedly.  Which is no good.  :-)
>>
>> So I think the CGC's perform() method should actually be doing a  
>> cache.get() - or something very similar - where the entire  
>> interceptor chain is invoked and locks are acquired, entries moved  
>> to the data container if loaded, etc., and metrics on cache hits/ 
>> misses properly updated.  The challenges here are, of course:
>>
>> 1.  The ping-pong effect.  So we need to make sure any clustered  
>> cache loader or the DistributionInterceptor do not try and load  
>> this from elsewhere in the cluster.  Easily solved with the  
>> isOriginLocal flag on the context.
> yes, +1
>> 2.   Making sure we get an InternalCacheEntry and not just a value  
>> to return.  This is trickier since ICEs are not exposed via public  
>> APIs at all.  Perhaps we need a separate, non-replicable Command  
>> for this purpose - maybe a subclass of GetKeyValueCommand  
>> (GetCacheEntryCommand?) which (a) cannot be replicated and (b) will  
>> return the ICE.
> I see, it gets a bit complicated now... TBH, I think the 'problem'  
> is in the fact that our interceptors are a bit too 'fat' - with this  
> interceptor approach we "spread" the ClusterGet logic in too many  
> places:
> - in the ClusterGet command, that does the call
> - in the DistributionInterceptor, where you have to add the  
> additional if, to check that the call is local
> - (other interceptors will have to do that as well).
>
> Having thinner interceptors would move the logic in one place only  
> (e.g. LockingInterceptor is thin, as all locking logic is in  
> LockManager, and all the calls are delegated to it, no 'fat' logic  
> in interceptor itself) would make the code easier to follow and  
> change (and solve the above issue, with not having access to  
> InternalCacheEntry).
> So, the code in ClusterGetCommand might look something like this:
>
> perform() {
>   lockManager.lock(key);
>   if  (dataContainer.get(key) != null) return...;
>   if (loaderManager != null) return loaderManager.get(key);
>   lockManager.release(key);
> }

I think this is evil (!) because it effectively duplicates the  
sequence of events determined by the interceptors.  Plus in introduces  
brittleness and a lack of scalability and customisability since it  
short-circuits what other interceptors do.  E.g., what if I introduce  
a new interceptor between LockInterceptor and CacheLoaderInterceptor  
which is meant to decrypt data from the store since what is loaded is  
encrypted, and I cannot do this in the store since the key is on the  
caller's thread (maybe a JAAS Principal)?  Already this also breaks  
any cache statistics on loads, etc being collected by the JMX  
interceptors.

In general I strongly disagree with duplicating this sequence  
elsewhere.  The interceptors are self-contained and are designed to  
ignore and pass on invocations they are not interested in.

> I think the code is much slef-explanatory than a plain cache.get(),  
> that does lots of things not imediately visible.

That's the whole point.  It should *not* be visible as to what happens  
during a get() since this should be decided on configuration, custom  
interceptors being injected, etc.  There is no way you know this at  
compile time to hard code this sequence into a single  
command.perform() method.

> Also if he wants to add another lookup from somewhere else, it's  
> easier.

Why should there ever be other lookups from elsewhere?  This only  
leads to potential for breakage.

> There is no code duplication as well, as all the work is delegated  
> to the managers.

Not duplication of work but definite duplication of sequence of events.

Cheers
--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org