[infinispan-dev] RPCs for non-existant caches ought not throw exception

Mon Sep 13 10:19:13 EDT 2010

On Mon, 2010-09-13 at 15:05 +0200, Galder Zamarreño wrote:
> I've had a brief look at this, need to spend a bit more time but here's an initial view on this,
> 
> At the moment at least, InboundInvocationHandlerImpl doesn't take in
> account ComponentStatus to see if it's up. It only checks whether the
> component registry is null, but a ComponentStatus check might make
> more sense.

After the component registry null check, is the following:

if (!cr.getStatus().allowInvocations()) {
   giveupTime = System.currentTimeMillis() + localConfig.getStateRetrievalTimeout();
   while (cr.getStatus().startingUp() && System.currentTimeMillis() < giveupTime) Thread.sleep(100);
   if (!cr.getStatus().allowInvocations()) {
      log.warn("Cache named [{0}] exists but isn't in a state to handle invocations.  Its state is {1}.", cacheName, cr.getStatus());
      return RequestIgnoredResponse.INSTANCE;
   }
}

So, there is, in fact, a ComponentStatus check.  If the registry is not
RUNNING, then we spin for up to 30 seconds for the status to become
RUNNING.  For a stopping or stopped cache, this does not seem to make
sense, since these states do not indicate that the cache is in the
process of starting.

> When I looked at this a while back, I'd have ideally like to be able
> to start a cache associated with the unknown cache request, however
> this is not feasible cos you can't know what configuration it should
> be started with.
> 
> At first glance, a different valid status would be the way forward,
> but you have to think about the state transfer and distribution logic
> and that's the hard bit. If a cache is started in a non-coordinator,
> and the coordinator has not yet started that cache, how does state
> transfer or rehash control work? Both of them rely on some kind of
> logic running on coordinator. Now, who's the coordinator in that case?
> The coordinator is in theory the first node started, but what if the
> cache is not yet started in the coordinator? The coordinator now
> becomes a variant of the Cache rather than the CacheManager.
> 
> I think the latter is the bigger problem to solve here.

Agreed.

> On Sep 10, 2010, at 7:16 PM, Paul Ferraro wrote:
> 
> > OK - the plot thickens...
> > RequestIgnoredResponse is not actually appropriate because it's an
> > invalid response (i.e. extends InvalidResponse).  Oops.
> > So, not only would we either need to return a valid response (perhaps
> > null, like the behavior prior to ISPN-447 ?), but an RPC for a stopped
> > (or stopping) cache should also be considered valid.  For example, if I
> > have an app deployed on 2 nodes, and I undeploy the app from node2, this
> > would cause RPC-bound cache operations to fail on node1.  Actually,
> > these RPCs would timeout, since the InboundInvocationHandler will wait
> > 30 seconds for them to start.  That's no good.
> > 
> > To address this would require some changes to the behavior of some of
> > the ComponentStatus values.  For example, ComponentStatus.startingUp()
> > returns true for STOPPING and TERMINATED, and consequently
> > InboundInvocationHandler loops for 30 seconds hoping the cache will
> > start.  That doesn't seem appropriate for the use case above.  Would it
> > be possible to return a valid ignored response (e.g. null) for these
> > states?
> > 
> > Thoughts?
> > 
> > On Fri, 2010-09-10 at 11:54 -0400, Paul Ferraro wrote:
> >> In AS clustering, there are several use cases where a specific cache
> >> instance may not exist (or may not be started) for every member of the
> >> group.  Currently, Infinispan treats this as an exception case, and any
> >> cache operation resulting in an RPC will fail.  This is problematic for
> >> the following AS use cases:
> >> 
> >> 1. For a given clustering service (e.g. web session, SFSBs, entity
> >> caching) there is a shared cache manager for all applications, while
> >> each application uses its own cache instance.  If I have app1 running on
> >> node1 and node2, everything is fine.  But if I deploy app2 on node1,
> >> it's membership will include node2 (because of the shared cache manager)
> >> even though there is no cache instance for app2 on node2.  Consequently,
> >> the cache instances for app2 will be non-functional until app2 is
> >> deployed on node2.
> >> 2. In Hibernate's 2nd level cache, custom cache regions are created on
> >> demand.  So, even with a single app running on 2 nodes, the first
> >> request to cache an entity in a custom cache region on node1 will fail,
> >> since the cache corresponding to the region will not exist on node2.
> >> 
> >> Here's is relevant code in
> >> InboundInvocationHandlerImpl.handle(CacheRpcCommand):
> >> 
> >> String cacheName = cmd.getCacheName();
> >> ComponentRegistry cr = gcr.getNamedComponentRegistry(cacheName);
> >> long giveupTime = System.currentTimeMillis() + 30000; // arbitraty (?) wait time for caches to start
> >> while (cr == null && System.currentTimeMillis() < giveupTime) {
> >>   Thread.sleep(100);
> >>   cr = gcr.getNamedComponentRegistry(cacheName);
> >> }
> >> 
> >> if (cr == null) {
> >>   if (log.isDebugEnabled()) log.debug("Cache named {0} does not exist on this cache manager!", cacheName);
> >>   return new ExceptionResponse(new NamedCacheNotFoundException(cacheName));
> >> // return RequestIgnoredResponse.INSTANCE; // Suggested fix?
> >> }
> >> 
> >> For the perspective of the AS, a request for a non-existent cache should
> >> be treated the same way as a request for a stopped cache (that logic
> >> returns RequestIgnoredResponse.INSTANCE).
> >> As Galder pointed out, handling this case via exception was an explicit
> >> workaround for this issue: https://jira.jboss.org/browse/ISPN-447
> >> In the comments for ISPN-447, Manik seemed to suggest that returning an
> >> exception is merely a workaround until this issue is fixed:
> >> https://jira.jboss.org/browse/ISPN-434
> >> 
> >> As it stands, this is a blocker issue for AS infinispan integration.
> >> 
> >> Thoughts?
> >> 
> >> _______________________________________________
> >> infinispan-dev mailing list
> >> infinispan-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> > 
> > 
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> --
> Galder Zamarreño
> Sr. Software Engineer
> Infinispan, JBoss Cache
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev