[infinispan-dev] RPCs for non-existant caches ought not throw exception

Mon Sep 27 12:15:03 EDT 2010

On 22 Sep 2010, at 00:07, Sanne Grinovero wrote:

> 2010/9/14 Manik Surtani <manik at jboss.org>:
>> FYI, just checked in
>> http://fisheye.jboss.org/changelog/Infinispan/branches/4.2.x?cs=2361
>> and tests run clean.
> 
> I just crashed against this, and after finding ISPN-648 I've tried to
> set strictPeerToPeer="false", this seemed initially to improve the
> situation as no more complaints about inexistent caches where logged,
> but then I got timeouts during StateTransfers, so I opened ISPN-661
> (which contains the full stacktraces of this timeout).
> 
> After changing Infinispan to timeout after 10 minutes, I'm back to the
> exceptions
> "org.infinispan.remoting.InboundInvocationHandlerImpl] Cache named
> (cachename) does not exist on this cache manager!"
> 
> Is there any known workaround to have a second node join the cluster
> while not all caches are initialized at the same time?
> 
> BTW from what I understood it seems I definitely need
> strictPeerToPeer="false", shouldn't this be the default?
> I think my use case is quite common, I just start more than one cache
> lazily. (Also I can't pre-start them as the configuration is not known
> until a service requests a cache, the invoker's context affects this
> configuration)

You start the caches lazily, but the cache managers eagerly?

> 
> Cheers,
> Sanne
> 
> 
>> 
>> On 14 Sep 2010, at 15:27, Manik Surtani wrote:
>> 
>> On 14 Sep 2010, at 05:04, Paul Ferraro wrote:
>> 
>> On Mon, 2010-09-13 at 18:12 +0100, Manik Surtani wrote:
>> 
>> So in essence a "correct" response would be:
>> 
>> 1)  If the cache is stopping -> ACK with a ValidResponse
>> 
>> Do we have a notion of an ignored (but not invalid) response, i.e. don't
>> 
>> trigger a retry/rollback?
>> 
>> We can certainly change this for RequestIgnoredResponse by overriding
>> isValid() to return true since it is, as you say, a valid response.  Would
>> need to run through the test suite to make sure such a change doesn't break
>> anything though.
>> 
>> 2)  If the cache is starting, try and wait till we can accept the RPC
>> 
>> Yes, except that ComponentStatus.startingUp() currently returns true for
>> 
>> every status exception RUNNING.  IMO, it would make more sense to
>> 
>> restrict this to INSTANTIATED and INITIALIZING.
>> 
>> Again, startingUp() would need to be fixed accordingly - and tested.
>> 
>> 
>> 3)  If the cache doesn't exist, ACK with a valid response as well?  Surely
>> this will lead to inconsistencies, since the RPC originator will assume the
>> RPC has completed when in fact nothing has happened?
>> 
>> From the AS's perspective, an RPC for a non-existent cache (e.g. yet to
>> 
>> be deployed app) should be handled no differently than an RPC for a
>> 
>> stopping/stopped cache (e.g. undeployed app).
>> 
>> I'm not suggesting we should be lie to the RPC originator, but rather
>> 
>> that it should be able to distinguish a normal valid response from an
>> 
>> ignored (but valid) response.
>> 
>> Agreed, but how does this difference manifest itself from a caller's
>> perspective?
>> 
>> 
>> On 13 Sep 2010, at 15:19, Paul Ferraro wrote:
>> 
>> On Mon, 2010-09-13 at 15:05 +0200, Galder Zamarreño wrote:
>> 
>> I've had a brief look at this, need to spend a bit more time but here's an
>> initial view on this,
>> 
>> At the moment at least, InboundInvocationHandlerImpl doesn't take in
>> 
>> account ComponentStatus to see if it's up. It only checks whether the
>> 
>> component registry is null, but a ComponentStatus check might make
>> 
>> more sense.
>> 
>> After the component registry null check, is the following:
>> 
>> if (!cr.getStatus().allowInvocations()) {
>> 
>> giveupTime = System.currentTimeMillis() +
>> localConfig.getStateRetrievalTimeout();
>> 
>> while (cr.getStatus().startingUp() && System.currentTimeMillis() <
>> giveupTime) Thread.sleep(100);
>> 
>> if (!cr.getStatus().allowInvocations()) {
>> 
>>   log.warn("Cache named [{0}] exists but isn't in a state to handle
>> invocations.  Its state is {1}.", cacheName, cr.getStatus());
>> 
>>   return RequestIgnoredResponse.INSTANCE;
>> 
>> }
>> 
>> }
>> 
>> So, there is, in fact, a ComponentStatus check.  If the registry is not
>> 
>> RUNNING, then we spin for up to 30 seconds for the status to become
>> 
>> RUNNING.  For a stopping or stopped cache, this does not seem to make
>> 
>> sense, since these states do not indicate that the cache is in the
>> 
>> process of starting.
>> 
>> When I looked at this a while back, I'd have ideally like to be able
>> 
>> to start a cache associated with the unknown cache request, however
>> 
>> this is not feasible cos you can't know what configuration it should
>> 
>> be started with.
>> 
>> At first glance, a different valid status would be the way forward,
>> 
>> but you have to think about the state transfer and distribution logic
>> 
>> and that's the hard bit. If a cache is started in a non-coordinator,
>> 
>> and the coordinator has not yet started that cache, how does state
>> 
>> transfer or rehash control work? Both of them rely on some kind of
>> 
>> logic running on coordinator. Now, who's the coordinator in that case?
>> 
>> The coordinator is in theory the first node started, but what if the
>> 
>> cache is not yet started in the coordinator? The coordinator now
>> 
>> becomes a variant of the Cache rather than the CacheManager.
>> 
>> I think the latter is the bigger problem to solve here.
>> 
>> Agreed.
>> 
>> On Sep 10, 2010, at 7:16 PM, Paul Ferraro wrote:
>> 
>> OK - the plot thickens...
>> 
>> RequestIgnoredResponse is not actually appropriate because it's an
>> 
>> invalid response (i.e. extends InvalidResponse).  Oops.
>> 
>> So, not only would we either need to return a valid response (perhaps
>> 
>> null, like the behavior prior to ISPN-447 ?), but an RPC for a stopped
>> 
>> (or stopping) cache should also be considered valid.  For example, if I
>> 
>> have an app deployed on 2 nodes, and I undeploy the app from node2, this
>> 
>> would cause RPC-bound cache operations to fail on node1.  Actually,
>> 
>> these RPCs would timeout, since the InboundInvocationHandler will wait
>> 
>> 30 seconds for them to start.  That's no good.
>> 
>> To address this would require some changes to the behavior of some of
>> 
>> the ComponentStatus values.  For example, ComponentStatus.startingUp()
>> 
>> returns true for STOPPING and TERMINATED, and consequently
>> 
>> InboundInvocationHandler loops for 30 seconds hoping the cache will
>> 
>> start.  That doesn't seem appropriate for the use case above.  Would it
>> 
>> be possible to return a valid ignored response (e.g. null) for these
>> 
>> states?
>> 
>> Thoughts?
>> 
>> On Fri, 2010-09-10 at 11:54 -0400, Paul Ferraro wrote:
>> 
>> In AS clustering, there are several use cases where a specific cache
>> 
>> instance may not exist (or may not be started) for every member of the
>> 
>> group.  Currently, Infinispan treats this as an exception case, and any
>> 
>> cache operation resulting in an RPC will fail.  This is problematic for
>> 
>> the following AS use cases:
>> 
>> 1. For a given clustering service (e.g. web session, SFSBs, entity
>> 
>> caching) there is a shared cache manager for all applications, while
>> 
>> each application uses its own cache instance.  If I have app1 running on
>> 
>> node1 and node2, everything is fine.  But if I deploy app2 on node1,
>> 
>> it's membership will include node2 (because of the shared cache manager)
>> 
>> even though there is no cache instance for app2 on node2.  Consequently,
>> 
>> the cache instances for app2 will be non-functional until app2 is
>> 
>> deployed on node2.
>> 
>> 2. In Hibernate's 2nd level cache, custom cache regions are created on
>> 
>> demand.  So, even with a single app running on 2 nodes, the first
>> 
>> request to cache an entity in a custom cache region on node1 will fail,
>> 
>> since the cache corresponding to the region will not exist on node2.
>> 
>> Here's is relevant code in
>> 
>> InboundInvocationHandlerImpl.handle(CacheRpcCommand):
>> 
>> String cacheName = cmd.getCacheName();
>> 
>> ComponentRegistry cr = gcr.getNamedComponentRegistry(cacheName);
>> 
>> long giveupTime = System.currentTimeMillis() + 30000; // arbitraty (?) wait
>> time for caches to start
>> 
>> while (cr == null && System.currentTimeMillis() < giveupTime) {
>> 
>> Thread.sleep(100);
>> 
>> cr = gcr.getNamedComponentRegistry(cacheName);
>> 
>> }
>> 
>> if (cr == null) {
>> 
>> if (log.isDebugEnabled()) log.debug("Cache named {0} does not exist on this
>> cache manager!", cacheName);
>> 
>> return new ExceptionResponse(new NamedCacheNotFoundException(cacheName));
>> 
>> // return RequestIgnoredResponse.INSTANCE; // Suggested fix?
>> 
>> }
>> 
>> For the perspective of the AS, a request for a non-existent cache should
>> 
>> be treated the same way as a request for a stopped cache (that logic
>> 
>> returns RequestIgnoredResponse.INSTANCE).
>> 
>> As Galder pointed out, handling this case via exception was an explicit
>> 
>> workaround for this issue: https://jira.jboss.org/browse/ISPN-447
>> 
>> In the comments for ISPN-447, Manik seemed to suggest that returning an
>> 
>> exception is merely a workaround until this issue is fixed:
>> 
>> https://jira.jboss.org/browse/ISPN-434
>> 
>> As it stands, this is a blocker issue for AS infinispan integration.
>> 
>> Thoughts?
>> 
>> _______________________________________________
>> 
>> infinispan-dev mailing list
>> 
>> infinispan-dev at lists.jboss.org
>> 
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> _______________________________________________
>> 
>> infinispan-dev mailing list
>> 
>> infinispan-dev at lists.jboss.org
>> 
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> --
>> 
>> Galder Zamarreño
>> 
>> Sr. Software Engineer
>> 
>> Infinispan, JBoss Cache
>> 
>> 
>> _______________________________________________
>> 
>> infinispan-dev mailing list
>> 
>> infinispan-dev at lists.jboss.org
>> 
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> _______________________________________________
>> 
>> infinispan-dev mailing list
>> 
>> infinispan-dev at lists.jboss.org
>> 
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> --
>> 
>> Manik Surtani
>> 
>> manik at jboss.org
>> 
>> Lead, Infinispan
>> 
>> Lead, JBoss Cache
>> 
>> http://www.infinispan.org
>> 
>> http://www.jbosscache.org
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> 
>> infinispan-dev mailing list
>> 
>> infinispan-dev at lists.jboss.org
>> 
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> _______________________________________________
>> 
>> infinispan-dev mailing list
>> 
>> infinispan-dev at lists.jboss.org
>> 
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> --
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> --
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>> 
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org