[infinispan-dev] RPCs for non-existant caches ought not throw exception
Manik Surtani
manik at jboss.org
Tue Sep 14 10:28:46 EDT 2010
FYI, just checked in
http://fisheye.jboss.org/changelog/Infinispan/branches/4.2.x?cs=2361
and tests run clean.
On 14 Sep 2010, at 15:27, Manik Surtani wrote:
>
> On 14 Sep 2010, at 05:04, Paul Ferraro wrote:
>
>> On Mon, 2010-09-13 at 18:12 +0100, Manik Surtani wrote:
>>> So in essence a "correct" response would be:
>>>
>>> 1) If the cache is stopping -> ACK with a ValidResponse
>>
>> Do we have a notion of an ignored (but not invalid) response, i.e. don't
>> trigger a retry/rollback?
>
> We can certainly change this for RequestIgnoredResponse by overriding isValid() to return true since it is, as you say, a valid response. Would need to run through the test suite to make sure such a change doesn't break anything though.
>
>>> 2) If the cache is starting, try and wait till we can accept the RPC
>>
>> Yes, except that ComponentStatus.startingUp() currently returns true for
>> every status exception RUNNING. IMO, it would make more sense to
>> restrict this to INSTANTIATED and INITIALIZING.
>
> Again, startingUp() would need to be fixed accordingly - and tested.
>
>>
>>> 3) If the cache doesn't exist, ACK with a valid response as well? Surely this will lead to inconsistencies, since the RPC originator will assume the RPC has completed when in fact nothing has happened?
>>
>> From the AS's perspective, an RPC for a non-existent cache (e.g. yet to
>> be deployed app) should be handled no differently than an RPC for a
>> stopping/stopped cache (e.g. undeployed app).
>> I'm not suggesting we should be lie to the RPC originator, but rather
>> that it should be able to distinguish a normal valid response from an
>> ignored (but valid) response.
>
> Agreed, but how does this difference manifest itself from a caller's perspective?
>
>>
>>> On 13 Sep 2010, at 15:19, Paul Ferraro wrote:
>>>
>>>> On Mon, 2010-09-13 at 15:05 +0200, Galder Zamarreño wrote:
>>>>> I've had a brief look at this, need to spend a bit more time but here's an initial view on this,
>>>>>
>>>>> At the moment at least, InboundInvocationHandlerImpl doesn't take in
>>>>> account ComponentStatus to see if it's up. It only checks whether the
>>>>> component registry is null, but a ComponentStatus check might make
>>>>> more sense.
>>>>
>>>> After the component registry null check, is the following:
>>>>
>>>> if (!cr.getStatus().allowInvocations()) {
>>>> giveupTime = System.currentTimeMillis() + localConfig.getStateRetrievalTimeout();
>>>> while (cr.getStatus().startingUp() && System.currentTimeMillis() < giveupTime) Thread.sleep(100);
>>>> if (!cr.getStatus().allowInvocations()) {
>>>> log.warn("Cache named [{0}] exists but isn't in a state to handle invocations. Its state is {1}.", cacheName, cr.getStatus());
>>>> return RequestIgnoredResponse.INSTANCE;
>>>> }
>>>> }
>>>>
>>>> So, there is, in fact, a ComponentStatus check. If the registry is not
>>>> RUNNING, then we spin for up to 30 seconds for the status to become
>>>> RUNNING. For a stopping or stopped cache, this does not seem to make
>>>> sense, since these states do not indicate that the cache is in the
>>>> process of starting.
>>>>
>>>>> When I looked at this a while back, I'd have ideally like to be able
>>>>> to start a cache associated with the unknown cache request, however
>>>>> this is not feasible cos you can't know what configuration it should
>>>>> be started with.
>>>>>
>>>>> At first glance, a different valid status would be the way forward,
>>>>> but you have to think about the state transfer and distribution logic
>>>>> and that's the hard bit. If a cache is started in a non-coordinator,
>>>>> and the coordinator has not yet started that cache, how does state
>>>>> transfer or rehash control work? Both of them rely on some kind of
>>>>> logic running on coordinator. Now, who's the coordinator in that case?
>>>>> The coordinator is in theory the first node started, but what if the
>>>>> cache is not yet started in the coordinator? The coordinator now
>>>>> becomes a variant of the Cache rather than the CacheManager.
>>>>>
>>>>> I think the latter is the bigger problem to solve here.
>>>>
>>>> Agreed.
>>>>
>>>>> On Sep 10, 2010, at 7:16 PM, Paul Ferraro wrote:
>>>>>
>>>>>> OK - the plot thickens...
>>>>>> RequestIgnoredResponse is not actually appropriate because it's an
>>>>>> invalid response (i.e. extends InvalidResponse). Oops.
>>>>>> So, not only would we either need to return a valid response (perhaps
>>>>>> null, like the behavior prior to ISPN-447 ?), but an RPC for a stopped
>>>>>> (or stopping) cache should also be considered valid. For example, if I
>>>>>> have an app deployed on 2 nodes, and I undeploy the app from node2, this
>>>>>> would cause RPC-bound cache operations to fail on node1. Actually,
>>>>>> these RPCs would timeout, since the InboundInvocationHandler will wait
>>>>>> 30 seconds for them to start. That's no good.
>>>>>>
>>>>>> To address this would require some changes to the behavior of some of
>>>>>> the ComponentStatus values. For example, ComponentStatus.startingUp()
>>>>>> returns true for STOPPING and TERMINATED, and consequently
>>>>>> InboundInvocationHandler loops for 30 seconds hoping the cache will
>>>>>> start. That doesn't seem appropriate for the use case above. Would it
>>>>>> be possible to return a valid ignored response (e.g. null) for these
>>>>>> states?
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> On Fri, 2010-09-10 at 11:54 -0400, Paul Ferraro wrote:
>>>>>>> In AS clustering, there are several use cases where a specific cache
>>>>>>> instance may not exist (or may not be started) for every member of the
>>>>>>> group. Currently, Infinispan treats this as an exception case, and any
>>>>>>> cache operation resulting in an RPC will fail. This is problematic for
>>>>>>> the following AS use cases:
>>>>>>>
>>>>>>> 1. For a given clustering service (e.g. web session, SFSBs, entity
>>>>>>> caching) there is a shared cache manager for all applications, while
>>>>>>> each application uses its own cache instance. If I have app1 running on
>>>>>>> node1 and node2, everything is fine. But if I deploy app2 on node1,
>>>>>>> it's membership will include node2 (because of the shared cache manager)
>>>>>>> even though there is no cache instance for app2 on node2. Consequently,
>>>>>>> the cache instances for app2 will be non-functional until app2 is
>>>>>>> deployed on node2.
>>>>>>> 2. In Hibernate's 2nd level cache, custom cache regions are created on
>>>>>>> demand. So, even with a single app running on 2 nodes, the first
>>>>>>> request to cache an entity in a custom cache region on node1 will fail,
>>>>>>> since the cache corresponding to the region will not exist on node2.
>>>>>>>
>>>>>>> Here's is relevant code in
>>>>>>> InboundInvocationHandlerImpl.handle(CacheRpcCommand):
>>>>>>>
>>>>>>> String cacheName = cmd.getCacheName();
>>>>>>> ComponentRegistry cr = gcr.getNamedComponentRegistry(cacheName);
>>>>>>> long giveupTime = System.currentTimeMillis() + 30000; // arbitraty (?) wait time for caches to start
>>>>>>> while (cr == null && System.currentTimeMillis() < giveupTime) {
>>>>>>> Thread.sleep(100);
>>>>>>> cr = gcr.getNamedComponentRegistry(cacheName);
>>>>>>> }
>>>>>>>
>>>>>>> if (cr == null) {
>>>>>>> if (log.isDebugEnabled()) log.debug("Cache named {0} does not exist on this cache manager!", cacheName);
>>>>>>> return new ExceptionResponse(new NamedCacheNotFoundException(cacheName));
>>>>>>> // return RequestIgnoredResponse.INSTANCE; // Suggested fix?
>>>>>>> }
>>>>>>>
>>>>>>> For the perspective of the AS, a request for a non-existent cache should
>>>>>>> be treated the same way as a request for a stopped cache (that logic
>>>>>>> returns RequestIgnoredResponse.INSTANCE).
>>>>>>> As Galder pointed out, handling this case via exception was an explicit
>>>>>>> workaround for this issue: https://jira.jboss.org/browse/ISPN-447
>>>>>>> In the comments for ISPN-447, Manik seemed to suggest that returning an
>>>>>>> exception is merely a workaround until this issue is fixed:
>>>>>>> https://jira.jboss.org/browse/ISPN-434
>>>>>>>
>>>>>>> As it stands, this is a blocker issue for AS infinispan integration.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> infinispan-dev mailing list
>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>
>>>>> --
>>>>> Galder Zamarreño
>>>>> Sr. Software Engineer
>>>>> Infinispan, JBoss Cache
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> --
>>> Manik Surtani
>>> manik at jboss.org
>>> Lead, Infinispan
>>> Lead, JBoss Cache
>>> http://www.infinispan.org
>>> http://www.jbosscache.org
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20100914/aec74ef6/attachment-0001.html
More information about the infinispan-dev
mailing list