[infinispan-dev] RPCs for non-existant caches ought not throw exception

Tue Sep 14 10:28:46 EDT 2010

FYI, just checked in 

http://fisheye.jboss.org/changelog/Infinispan/branches/4.2.x?cs=2361

and tests run clean.

On 14 Sep 2010, at 15:27, Manik Surtani wrote:

> 
> On 14 Sep 2010, at 05:04, Paul Ferraro wrote:
> 
>> On Mon, 2010-09-13 at 18:12 +0100, Manik Surtani wrote:
>>> So in essence a "correct" response would be:
>>> 
>>> 1)  If the cache is stopping -> ACK with a ValidResponse
>> 
>> Do we have a notion of an ignored (but not invalid) response, i.e. don't
>> trigger a retry/rollback?
> 
> We can certainly change this for RequestIgnoredResponse by overriding isValid() to return true since it is, as you say, a valid response.  Would need to run through the test suite to make sure such a change doesn't break anything though.
> 
>>> 2)  If the cache is starting, try and wait till we can accept the RPC
>> 
>> Yes, except that ComponentStatus.startingUp() currently returns true for
>> every status exception RUNNING.  IMO, it would make more sense to
>> restrict this to INSTANTIATED and INITIALIZING.
> 
> Again, startingUp() would need to be fixed accordingly - and tested.
> 
>> 
>>> 3)  If the cache doesn't exist, ACK with a valid response as well?  Surely this will lead to inconsistencies, since the RPC originator will assume the RPC has completed when in fact nothing has happened?
>> 
>> From the AS's perspective, an RPC for a non-existent cache (e.g. yet to
>> be deployed app) should be handled no differently than an RPC for a
>> stopping/stopped cache (e.g. undeployed app).
>> I'm not suggesting we should be lie to the RPC originator, but rather
>> that it should be able to distinguish a normal valid response from an
>> ignored (but valid) response.
> 
> Agreed, but how does this difference manifest itself from a caller's perspective?
> 
>> 
>>> On 13 Sep 2010, at 15:19, Paul Ferraro wrote:
>>> 
>>>> On Mon, 2010-09-13 at 15:05 +0200, Galder Zamarreño wrote:
>>>>> I've had a brief look at this, need to spend a bit more time but here's an initial view on this,
>>>>> 
>>>>> At the moment at least, InboundInvocationHandlerImpl doesn't take in
>>>>> account ComponentStatus to see if it's up. It only checks whether the
>>>>> component registry is null, but a ComponentStatus check might make
>>>>> more sense.
>>>> 
>>>> After the component registry null check, is the following:
>>>> 
>>>> if (!cr.getStatus().allowInvocations()) {
>>>> giveupTime = System.currentTimeMillis() + localConfig.getStateRetrievalTimeout();
>>>> while (cr.getStatus().startingUp() && System.currentTimeMillis() < giveupTime) Thread.sleep(100);
>>>> if (!cr.getStatus().allowInvocations()) {
>>>>   log.warn("Cache named [{0}] exists but isn't in a state to handle invocations.  Its state is {1}.", cacheName, cr.getStatus());
>>>>   return RequestIgnoredResponse.INSTANCE;
>>>> }
>>>> }
>>>> 
>>>> So, there is, in fact, a ComponentStatus check.  If the registry is not
>>>> RUNNING, then we spin for up to 30 seconds for the status to become
>>>> RUNNING.  For a stopping or stopped cache, this does not seem to make
>>>> sense, since these states do not indicate that the cache is in the
>>>> process of starting.
>>>> 
>>>>> When I looked at this a while back, I'd have ideally like to be able
>>>>> to start a cache associated with the unknown cache request, however
>>>>> this is not feasible cos you can't know what configuration it should
>>>>> be started with.
>>>>> 
>>>>> At first glance, a different valid status would be the way forward,
>>>>> but you have to think about the state transfer and distribution logic
>>>>> and that's the hard bit. If a cache is started in a non-coordinator,
>>>>> and the coordinator has not yet started that cache, how does state
>>>>> transfer or rehash control work? Both of them rely on some kind of
>>>>> logic running on coordinator. Now, who's the coordinator in that case?
>>>>> The coordinator is in theory the first node started, but what if the
>>>>> cache is not yet started in the coordinator? The coordinator now
>>>>> becomes a variant of the Cache rather than the CacheManager.
>>>>> 
>>>>> I think the latter is the bigger problem to solve here.
>>>> 
>>>> Agreed.
>>>> 
>>>>> On Sep 10, 2010, at 7:16 PM, Paul Ferraro wrote:
>>>>> 
>>>>>> OK - the plot thickens...
>>>>>> RequestIgnoredResponse is not actually appropriate because it's an
>>>>>> invalid response (i.e. extends InvalidResponse).  Oops.
>>>>>> So, not only would we either need to return a valid response (perhaps
>>>>>> null, like the behavior prior to ISPN-447 ?), but an RPC for a stopped
>>>>>> (or stopping) cache should also be considered valid.  For example, if I
>>>>>> have an app deployed on 2 nodes, and I undeploy the app from node2, this
>>>>>> would cause RPC-bound cache operations to fail on node1.  Actually,
>>>>>> these RPCs would timeout, since the InboundInvocationHandler will wait
>>>>>> 30 seconds for them to start.  That's no good.
>>>>>> 
>>>>>> To address this would require some changes to the behavior of some of
>>>>>> the ComponentStatus values.  For example, ComponentStatus.startingUp()
>>>>>> returns true for STOPPING and TERMINATED, and consequently
>>>>>> InboundInvocationHandler loops for 30 seconds hoping the cache will
>>>>>> start.  That doesn't seem appropriate for the use case above.  Would it
>>>>>> be possible to return a valid ignored response (e.g. null) for these
>>>>>> states?
>>>>>> 
>>>>>> Thoughts?
>>>>>> 
>>>>>> On Fri, 2010-09-10 at 11:54 -0400, Paul Ferraro wrote:
>>>>>>> In AS clustering, there are several use cases where a specific cache
>>>>>>> instance may not exist (or may not be started) for every member of the
>>>>>>> group.  Currently, Infinispan treats this as an exception case, and any
>>>>>>> cache operation resulting in an RPC will fail.  This is problematic for
>>>>>>> the following AS use cases:
>>>>>>> 
>>>>>>> 1. For a given clustering service (e.g. web session, SFSBs, entity
>>>>>>> caching) there is a shared cache manager for all applications, while
>>>>>>> each application uses its own cache instance.  If I have app1 running on
>>>>>>> node1 and node2, everything is fine.  But if I deploy app2 on node1,
>>>>>>> it's membership will include node2 (because of the shared cache manager)
>>>>>>> even though there is no cache instance for app2 on node2.  Consequently,
>>>>>>> the cache instances for app2 will be non-functional until app2 is
>>>>>>> deployed on node2.
>>>>>>> 2. In Hibernate's 2nd level cache, custom cache regions are created on
>>>>>>> demand.  So, even with a single app running on 2 nodes, the first
>>>>>>> request to cache an entity in a custom cache region on node1 will fail,
>>>>>>> since the cache corresponding to the region will not exist on node2.
>>>>>>> 
>>>>>>> Here's is relevant code in
>>>>>>> InboundInvocationHandlerImpl.handle(CacheRpcCommand):
>>>>>>> 
>>>>>>> String cacheName = cmd.getCacheName();
>>>>>>> ComponentRegistry cr = gcr.getNamedComponentRegistry(cacheName);
>>>>>>> long giveupTime = System.currentTimeMillis() + 30000; // arbitraty (?) wait time for caches to start
>>>>>>> while (cr == null && System.currentTimeMillis() < giveupTime) {
>>>>>>> Thread.sleep(100);
>>>>>>> cr = gcr.getNamedComponentRegistry(cacheName);
>>>>>>> }
>>>>>>> 
>>>>>>> if (cr == null) {
>>>>>>> if (log.isDebugEnabled()) log.debug("Cache named {0} does not exist on this cache manager!", cacheName);
>>>>>>> return new ExceptionResponse(new NamedCacheNotFoundException(cacheName));
>>>>>>> // return RequestIgnoredResponse.INSTANCE; // Suggested fix?
>>>>>>> }
>>>>>>> 
>>>>>>> For the perspective of the AS, a request for a non-existent cache should
>>>>>>> be treated the same way as a request for a stopped cache (that logic
>>>>>>> returns RequestIgnoredResponse.INSTANCE).
>>>>>>> As Galder pointed out, handling this case via exception was an explicit
>>>>>>> workaround for this issue: https://jira.jboss.org/browse/ISPN-447
>>>>>>> In the comments for ISPN-447, Manik seemed to suggest that returning an
>>>>>>> exception is merely a workaround until this issue is fixed:
>>>>>>> https://jira.jboss.org/browse/ISPN-434
>>>>>>> 
>>>>>>> As it stands, this is a blocker issue for AS infinispan integration.
>>>>>>> 
>>>>>>> Thoughts?
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> infinispan-dev mailing list
>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> 
>>>>> --
>>>>> Galder Zamarreño
>>>>> Sr. Software Engineer
>>>>> Infinispan, JBoss Cache
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> 
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> --
>>> Manik Surtani
>>> manik at jboss.org
>>> Lead, Infinispan
>>> Lead, JBoss Cache
>>> http://www.infinispan.org
>>> http://www.jbosscache.org
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20100914/aec74ef6/attachment-0001.html