[infinispan-dev] RPCs for non-existant caches ought not throw exception

Mon Sep 13 09:45:42 EDT 2010

On Sep 13, 2010, at 3:05 PM, Galder Zamarreño wrote:

> I've had a brief look at this, need to spend a bit more time but here's an initial view on this,
> 
> At the moment at least, InboundInvocationHandlerImpl doesn't take in account ComponentStatus to see if it's up. It only checks whether the component registry is null, but a ComponentStatus check might make more sense.
> 
> When I looked at this a while back, I'd have ideally like to be able to start a cache associated with the unknown cache request, however this is not feasible cos you can't know what configuration it should be started with.
> 
> At first glance, a different valid status would be the way forward, but you have to think about the state transfer and distribution logic and that's the hard bit. If a cache is started in a non-coordinator, and the coordinator has not yet started that cache, how does state transfer or rehash control work? Both of them rely on some kind of logic running on coordinator. Now, who's the coordinator in that case? The coordinator is in theory the first node started, but what if the cache is not yet started in the coordinator? The coordinator now becomes a variant of the Cache rather than the CacheManager.
> 
> I think the latter is the bigger problem to solve here.

And this would require some work. Relying on JGroups to decide the transport would not be enough. We'd need to figure out (via RPC?) whether this cache is the first one started in the cluster. I'm not very keen on this as I get the feeling we'd be re-inveting something. IOW, we'd be re-implementing discovery + view management at the Cache level.

At the end of the day, the reason for throwing the exception is to get around scenarios like the one explained in https://jira.jboss.org/browse/ISPN-434. So, to stop throwing it, we need to find a way to solve that type of use cases.

A temporary, not pretty, workaround would be to have separate CacheManager per app. In the case of Hibernate this would be result in serious overhead, cos you'd have to have a CacheManager per Cache to avoid this type of issues.

> 
> On Sep 10, 2010, at 7:16 PM, Paul Ferraro wrote:
> 
>> OK - the plot thickens...
>> RequestIgnoredResponse is not actually appropriate because it's an
>> invalid response (i.e. extends InvalidResponse).  Oops.
>> So, not only would we either need to return a valid response (perhaps
>> null, like the behavior prior to ISPN-447 ?), but an RPC for a stopped
>> (or stopping) cache should also be considered valid.  For example, if I
>> have an app deployed on 2 nodes, and I undeploy the app from node2, this
>> would cause RPC-bound cache operations to fail on node1.  Actually,
>> these RPCs would timeout, since the InboundInvocationHandler will wait
>> 30 seconds for them to start.  That's no good.
>> 
>> To address this would require some changes to the behavior of some of
>> the ComponentStatus values.  For example, ComponentStatus.startingUp()
>> returns true for STOPPING and TERMINATED, and consequently
>> InboundInvocationHandler loops for 30 seconds hoping the cache will
>> start.  That doesn't seem appropriate for the use case above.  Would it
>> be possible to return a valid ignored response (e.g. null) for these
>> states?
>> 
>> Thoughts?
>> 
>> On Fri, 2010-09-10 at 11:54 -0400, Paul Ferraro wrote:
>>> In AS clustering, there are several use cases where a specific cache
>>> instance may not exist (or may not be started) for every member of the
>>> group.  Currently, Infinispan treats this as an exception case, and any
>>> cache operation resulting in an RPC will fail.  This is problematic for
>>> the following AS use cases:
>>> 
>>> 1. For a given clustering service (e.g. web session, SFSBs, entity
>>> caching) there is a shared cache manager for all applications, while
>>> each application uses its own cache instance.  If I have app1 running on
>>> node1 and node2, everything is fine.  But if I deploy app2 on node1,
>>> it's membership will include node2 (because of the shared cache manager)
>>> even though there is no cache instance for app2 on node2.  Consequently,
>>> the cache instances for app2 will be non-functional until app2 is
>>> deployed on node2.
>>> 2. In Hibernate's 2nd level cache, custom cache regions are created on
>>> demand.  So, even with a single app running on 2 nodes, the first
>>> request to cache an entity in a custom cache region on node1 will fail,
>>> since the cache corresponding to the region will not exist on node2.
>>> 
>>> Here's is relevant code in
>>> InboundInvocationHandlerImpl.handle(CacheRpcCommand):
>>> 
>>> String cacheName = cmd.getCacheName();
>>> ComponentRegistry cr = gcr.getNamedComponentRegistry(cacheName);
>>> long giveupTime = System.currentTimeMillis() + 30000; // arbitraty (?) wait time for caches to start
>>> while (cr == null && System.currentTimeMillis() < giveupTime) {
>>>  Thread.sleep(100);
>>>  cr = gcr.getNamedComponentRegistry(cacheName);
>>> }
>>> 
>>> if (cr == null) {
>>>  if (log.isDebugEnabled()) log.debug("Cache named {0} does not exist on this cache manager!", cacheName);
>>>  return new ExceptionResponse(new NamedCacheNotFoundException(cacheName));
>>> // return RequestIgnoredResponse.INSTANCE; // Suggested fix?
>>> }
>>> 
>>> For the perspective of the AS, a request for a non-existent cache should
>>> be treated the same way as a request for a stopped cache (that logic
>>> returns RequestIgnoredResponse.INSTANCE).
>>> As Galder pointed out, handling this case via exception was an explicit
>>> workaround for this issue: https://jira.jboss.org/browse/ISPN-447
>>> In the comments for ISPN-447, Manik seemed to suggest that returning an
>>> exception is merely a workaround until this issue is fixed:
>>> https://jira.jboss.org/browse/ISPN-434
>>> 
>>> As it stands, this is a blocker issue for AS infinispan integration.
>>> 
>>> Thoughts?
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> --
> Galder Zamarreño
> Sr. Software Engineer
> Infinispan, JBoss Cache
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache