[infinispan-dev] RPCs for non-existant caches ought not throw exception

Tue Sep 14 10:27:14 EDT 2010

On 14 Sep 2010, at 05:04, Paul Ferraro wrote:

> On Mon, 2010-09-13 at 18:12 +0100, Manik Surtani wrote:
>> So in essence a "correct" response would be:
>> 
>> 1)  If the cache is stopping -> ACK with a ValidResponse
> 
> Do we have a notion of an ignored (but not invalid) response, i.e. don't
> trigger a retry/rollback?

We can certainly change this for RequestIgnoredResponse by overriding isValid() to return true since it is, as you say, a valid response.  Would need to run through the test suite to make sure such a change doesn't break anything though.

>> 2)  If the cache is starting, try and wait till we can accept the RPC
> 
> Yes, except that ComponentStatus.startingUp() currently returns true for
> every status exception RUNNING.  IMO, it would make more sense to
> restrict this to INSTANTIATED and INITIALIZING.

Again, startingUp() would need to be fixed accordingly - and tested.

> 
>> 3)  If the cache doesn't exist, ACK with a valid response as well?  Surely this will lead to inconsistencies, since the RPC originator will assume the RPC has completed when in fact nothing has happened?
> 
> From the AS's perspective, an RPC for a non-existent cache (e.g. yet to
> be deployed app) should be handled no differently than an RPC for a
> stopping/stopped cache (e.g. undeployed app).
> I'm not suggesting we should be lie to the RPC originator, but rather
> that it should be able to distinguish a normal valid response from an
> ignored (but valid) response.

Agreed, but how does this difference manifest itself from a caller's perspective?

> 
>> On 13 Sep 2010, at 15:19, Paul Ferraro wrote:
>> 
>>> On Mon, 2010-09-13 at 15:05 +0200, Galder Zamarreño wrote:
>>>> I've had a brief look at this, need to spend a bit more time but here's an initial view on this,
>>>> 
>>>> At the moment at least, InboundInvocationHandlerImpl doesn't take in
>>>> account ComponentStatus to see if it's up. It only checks whether the
>>>> component registry is null, but a ComponentStatus check might make
>>>> more sense.
>>> 
>>> After the component registry null check, is the following:
>>> 
>>> if (!cr.getStatus().allowInvocations()) {
>>> giveupTime = System.currentTimeMillis() + localConfig.getStateRetrievalTimeout();
>>> while (cr.getStatus().startingUp() && System.currentTimeMillis() < giveupTime) Thread.sleep(100);
>>> if (!cr.getStatus().allowInvocations()) {
>>>    log.warn("Cache named [{0}] exists but isn't in a state to handle invocations.  Its state is {1}.", cacheName, cr.getStatus());
>>>    return RequestIgnoredResponse.INSTANCE;
>>> }
>>> }
>>> 
>>> So, there is, in fact, a ComponentStatus check.  If the registry is not
>>> RUNNING, then we spin for up to 30 seconds for the status to become
>>> RUNNING.  For a stopping or stopped cache, this does not seem to make
>>> sense, since these states do not indicate that the cache is in the
>>> process of starting.
>>> 
>>>> When I looked at this a while back, I'd have ideally like to be able
>>>> to start a cache associated with the unknown cache request, however
>>>> this is not feasible cos you can't know what configuration it should
>>>> be started with.
>>>> 
>>>> At first glance, a different valid status would be the way forward,
>>>> but you have to think about the state transfer and distribution logic
>>>> and that's the hard bit. If a cache is started in a non-coordinator,
>>>> and the coordinator has not yet started that cache, how does state
>>>> transfer or rehash control work? Both of them rely on some kind of
>>>> logic running on coordinator. Now, who's the coordinator in that case?
>>>> The coordinator is in theory the first node started, but what if the
>>>> cache is not yet started in the coordinator? The coordinator now
>>>> becomes a variant of the Cache rather than the CacheManager.
>>>> 
>>>> I think the latter is the bigger problem to solve here.
>>> 
>>> Agreed.
>>> 
>>>> On Sep 10, 2010, at 7:16 PM, Paul Ferraro wrote:
>>>> 
>>>>> OK - the plot thickens...
>>>>> RequestIgnoredResponse is not actually appropriate because it's an
>>>>> invalid response (i.e. extends InvalidResponse).  Oops.
>>>>> So, not only would we either need to return a valid response (perhaps
>>>>> null, like the behavior prior to ISPN-447 ?), but an RPC for a stopped
>>>>> (or stopping) cache should also be considered valid.  For example, if I
>>>>> have an app deployed on 2 nodes, and I undeploy the app from node2, this
>>>>> would cause RPC-bound cache operations to fail on node1.  Actually,
>>>>> these RPCs would timeout, since the InboundInvocationHandler will wait
>>>>> 30 seconds for them to start.  That's no good.
>>>>> 
>>>>> To address this would require some changes to the behavior of some of
>>>>> the ComponentStatus values.  For example, ComponentStatus.startingUp()
>>>>> returns true for STOPPING and TERMINATED, and consequently
>>>>> InboundInvocationHandler loops for 30 seconds hoping the cache will
>>>>> start.  That doesn't seem appropriate for the use case above.  Would it
>>>>> be possible to return a valid ignored response (e.g. null) for these
>>>>> states?
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> On Fri, 2010-09-10 at 11:54 -0400, Paul Ferraro wrote:
>>>>>> In AS clustering, there are several use cases where a specific cache
>>>>>> instance may not exist (or may not be started) for every member of the
>>>>>> group.  Currently, Infinispan treats this as an exception case, and any
>>>>>> cache operation resulting in an RPC will fail.  This is problematic for
>>>>>> the following AS use cases:
>>>>>> 
>>>>>> 1. For a given clustering service (e.g. web session, SFSBs, entity
>>>>>> caching) there is a shared cache manager for all applications, while
>>>>>> each application uses its own cache instance.  If I have app1 running on
>>>>>> node1 and node2, everything is fine.  But if I deploy app2 on node1,
>>>>>> it's membership will include node2 (because of the shared cache manager)
>>>>>> even though there is no cache instance for app2 on node2.  Consequently,
>>>>>> the cache instances for app2 will be non-functional until app2 is
>>>>>> deployed on node2.
>>>>>> 2. In Hibernate's 2nd level cache, custom cache regions are created on
>>>>>> demand.  So, even with a single app running on 2 nodes, the first
>>>>>> request to cache an entity in a custom cache region on node1 will fail,
>>>>>> since the cache corresponding to the region will not exist on node2.
>>>>>> 
>>>>>> Here's is relevant code in
>>>>>> InboundInvocationHandlerImpl.handle(CacheRpcCommand):
>>>>>> 
>>>>>> String cacheName = cmd.getCacheName();
>>>>>> ComponentRegistry cr = gcr.getNamedComponentRegistry(cacheName);
>>>>>> long giveupTime = System.currentTimeMillis() + 30000; // arbitraty (?) wait time for caches to start
>>>>>> while (cr == null && System.currentTimeMillis() < giveupTime) {
>>>>>> Thread.sleep(100);
>>>>>> cr = gcr.getNamedComponentRegistry(cacheName);
>>>>>> }
>>>>>> 
>>>>>> if (cr == null) {
>>>>>> if (log.isDebugEnabled()) log.debug("Cache named {0} does not exist on this cache manager!", cacheName);
>>>>>> return new ExceptionResponse(new NamedCacheNotFoundException(cacheName));
>>>>>> // return RequestIgnoredResponse.INSTANCE; // Suggested fix?
>>>>>> }
>>>>>> 
>>>>>> For the perspective of the AS, a request for a non-existent cache should
>>>>>> be treated the same way as a request for a stopped cache (that logic
>>>>>> returns RequestIgnoredResponse.INSTANCE).
>>>>>> As Galder pointed out, handling this case via exception was an explicit
>>>>>> workaround for this issue: https://jira.jboss.org/browse/ISPN-447
>>>>>> In the comments for ISPN-447, Manik seemed to suggest that returning an
>>>>>> exception is merely a workaround until this issue is fixed:
>>>>>> https://jira.jboss.org/browse/ISPN-434
>>>>>> 
>>>>>> As it stands, this is a blocker issue for AS infinispan integration.
>>>>>> 
>>>>>> Thoughts?
>>>>>> 
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> 
>>>> --
>>>> Galder Zamarreño
>>>> Sr. Software Engineer
>>>> Infinispan, JBoss Cache
>>>> 
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> --
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org