[infinispan-dev] ClusteredGet and direct operations on a cacheloader

Wed Apr 22 10:17:46 EDT 2009

Manik Surtani wrote:
>
> On 22 Apr 2009, at 14:30, Mircea Markus wrote:
>
>> Manik Surtani wrote:
>>> Mircea
>>>
>>> Regarding direct loading from a cache loader in the 
>>> ClusteredGetCommand, this is an issue since
>>
>>> a) it does not acquire any locks to put this value back in the data 
>>> container, and
>> cache loaders are read only and stores use bucket locking, it's a 
>> different locking level though...
>
> I was referring to cache-level locking.  E.g., doing a load from a 
> loader and then a put into the data container without any locking from 
> the LockManager is a big no-no.  :-)
>
>>> b) probably as a consequence of a), doesn't bother to put any looked 
>>> up value in the data container at all.
>>>
>>> This is inefficient since multiple CGCs on the same key (coming from 
>>> different cache instances in a cluster) would cause the owning cache 
>>> to load this several times repeatedly from a loader.
>> At the same time, unique calls for different keys would increase the 
>> memory footprint... though I agree with your point of view overall: 
>> after all remote cache can be considered just another cache user.
>>>
>>> I see this problem with DIST since I use CGC to load an entry from a 
>>> remote cache and if L1 caching is disabled on the requesting cache, 
>>> multiple uses of an entry will result in this entry being read off a 
>>> loader repeatedly.  Which is no good.  :-)
>>>
>>> So I think the CGC's perform() method should actually be doing a 
>>> cache.get() - or something very similar - where the entire 
>>> interceptor chain is invoked and locks are acquired, entries moved 
>>> to the data container if loaded, etc., and metrics on cache 
>>> hits/misses properly updated.  The challenges here are, of course:
>>>
>>> 1.  The ping-pong effect.  So we need to make sure any clustered 
>>> cache loader or the DistributionInterceptor do not try and load this 
>>> from elsewhere in the cluster.  Easily solved with the isOriginLocal 
>>> flag on the context.
>> yes, +1
>>> 2.   Making sure we get an InternalCacheEntry and not just a value 
>>> to return.  This is trickier since ICEs are not exposed via public 
>>> APIs at all.  Perhaps we need a separate, non-replicable Command for 
>>> this purpose - maybe a subclass of GetKeyValueCommand 
>>> (GetCacheEntryCommand?) which (a) cannot be replicated and (b) will 
>>> return the ICE.
>> I see, it gets a bit complicated now... TBH, I think the 'problem' is 
>> in the fact that our interceptors are a bit too 'fat' - with this 
>> interceptor approach we "spread" the ClusterGet logic in too many 
>> places:
>> - in the ClusterGet command, that does the call
>> - in the DistributionInterceptor, where you have to add the 
>> additional if, to check that the call is local
>> - (other interceptors will have to do that as well).
>>
>> Having thinner interceptors would move the logic in one place only 
>> (e.g. LockingInterceptor is thin, as all locking logic is in 
>> LockManager, and all the calls are delegated to it, no 'fat' logic in 
>> interceptor itself) would make the code easier to follow and change 
>> (and solve the above issue, with not having access to 
>> InternalCacheEntry).
>> So, the code in ClusterGetCommand might look something like this:
>>
>> perform() {
>>   lockManager.lock(key);
>>   if  (dataContainer.get(key) != null) return...;
>>   if (loaderManager != null) return loaderManager.get(key);
>>   lockManager.release(key);
>> }
>
> I think this is evil (!) because it effectively duplicates the 
> sequence of events determined by the interceptors.  Plus in introduces 
> brittleness and a lack of scalability and customisability since it 
> short-circuits what other interceptors do.  E.g., what if I introduce 
> a new interceptor between LockInterceptor and CacheLoaderInterceptor 
> which is meant to decrypt data from the store since what is loaded is 
> encrypted, and I cannot do this in the store since the key is on the 
> caller's thread (maybe a JAAS Principal)?  Already this also breaks 
> any cache statistics on loads, etc being collected by the JMX 
> interceptors.
>
> In general I strongly disagree with duplicating this sequence 
> elsewhere.  The interceptors are self-contained and are designed to 
> ignore and pass on invocations they are not interested in.
>
>> I think the code is much slef-explanatory than a plain cache.get(), 
>> that does lots of things not imediately visible.
>
> That's the whole point.  It should *not* be visible as to what happens 
> during a get() since this should be decided on configuration, custom 
> interceptors being injected, etc.  There is no way you know this at 
> compile time to hard code this sequence into a single 
> command.perform() method.
>
>> Also if he wants to add another lookup from somewhere else, it's easier.
>
> Why should there ever be other lookups from elsewhere?  This only 
> leads to potential for breakage.
>
>> There is no code duplication as well, as all the work is delegated to 
>> the managers.
>
> Not duplication of work but definite duplication of sequence of events.
>
you are right :)
Guess the GetCacheEntryCommand is the way to go, as you suggested.
> Cheers
> -- 
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>