On 22 Apr 2009, at 14:30, Mircea Markus wrote:
Manik Surtani wrote:
> Mircea
>
> Regarding direct loading from a cache loader in the
> ClusteredGetCommand, this is an issue since
> a) it does not acquire any locks to put this value back in the data
> container, and
cache loaders are read only and stores use bucket locking, it's a
different locking level though...
I was referring to cache-level locking. E.g., doing a load from a
loader and then a put into the data container without any locking from
the LockManager is a big no-no. :-)
> b) probably as a consequence of a), doesn't bother to put any
> looked up value in the data container at all.
>
> This is inefficient since multiple CGCs on the same key (coming
> from different cache instances in a cluster) would cause the owning
> cache to load this several times repeatedly from a loader.
At the same time, unique calls for different keys would increase the
memory footprint... though I agree with your point of view overall:
after all remote cache can be considered just another cache user.
>
> I see this problem with DIST since I use CGC to load an entry from
> a remote cache and if L1 caching is disabled on the requesting
> cache, multiple uses of an entry will result in this entry being
> read off a loader repeatedly. Which is no good. :-)
>
> So I think the CGC's perform() method should actually be doing a
> cache.get() - or something very similar - where the entire
> interceptor chain is invoked and locks are acquired, entries moved
> to the data container if loaded, etc., and metrics on cache hits/
> misses properly updated. The challenges here are, of course:
>
> 1. The ping-pong effect. So we need to make sure any clustered
> cache loader or the DistributionInterceptor do not try and load
> this from elsewhere in the cluster. Easily solved with the
> isOriginLocal flag on the context.
yes, +1
> 2. Making sure we get an InternalCacheEntry and not just a value
> to return. This is trickier since ICEs are not exposed via public
> APIs at all. Perhaps we need a separate, non-replicable Command
> for this purpose - maybe a subclass of GetKeyValueCommand
> (GetCacheEntryCommand?) which (a) cannot be replicated and (b) will
> return the ICE.
I see, it gets a bit complicated now... TBH, I think the 'problem'
is in the fact that our interceptors are a bit too 'fat' - with this
interceptor approach we "spread" the ClusterGet logic in too many
places:
- in the ClusterGet command, that does the call
- in the DistributionInterceptor, where you have to add the
additional if, to check that the call is local
- (other interceptors will have to do that as well).
Having thinner interceptors would move the logic in one place only
(e.g. LockingInterceptor is thin, as all locking logic is in
LockManager, and all the calls are delegated to it, no 'fat' logic
in interceptor itself) would make the code easier to follow and
change (and solve the above issue, with not having access to
InternalCacheEntry).
So, the code in ClusterGetCommand might look something like this:
perform() {
lockManager.lock(key);
if (dataContainer.get(key) != null) return...;
if (loaderManager != null) return loaderManager.get(key);
lockManager.release(key);
}
I think this is evil (!) because it effectively duplicates the
sequence of events determined by the interceptors. Plus in introduces
brittleness and a lack of scalability and customisability since it
short-circuits what other interceptors do. E.g., what if I introduce
a new interceptor between LockInterceptor and CacheLoaderInterceptor
which is meant to decrypt data from the store since what is loaded is
encrypted, and I cannot do this in the store since the key is on the
caller's thread (maybe a JAAS Principal)? Already this also breaks
any cache statistics on loads, etc being collected by the JMX
interceptors.
In general I strongly disagree with duplicating this sequence
elsewhere. The interceptors are self-contained and are designed to
ignore and pass on invocations they are not interested in.
I think the code is much slef-explanatory than a plain cache.get(),
that does lots of things not imediately visible.
That's the whole point. It should *not* be visible as to what happens
during a get() since this should be decided on configuration, custom
interceptors being injected, etc. There is no way you know this at
compile time to hard code this sequence into a single
command.perform() method.
Also if he wants to add another lookup from somewhere else, it's
easier.
Why should there ever be other lookups from elsewhere? This only
leads to potential for breakage.
There is no code duplication as well, as all the work is delegated
to the managers.
Not duplication of work but definite duplication of sequence of events.
Cheers
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org