[infinispan-dev] RPCs for non-existant caches ought not throw exception
Manik Surtani
manik at jboss.org
Mon Sep 27 12:15:03 EDT 2010
On 22 Sep 2010, at 00:07, Sanne Grinovero wrote:
> 2010/9/14 Manik Surtani <manik at jboss.org>:
>> FYI, just checked in
>> http://fisheye.jboss.org/changelog/Infinispan/branches/4.2.x?cs=2361
>> and tests run clean.
>
> I just crashed against this, and after finding ISPN-648 I've tried to
> set strictPeerToPeer="false", this seemed initially to improve the
> situation as no more complaints about inexistent caches where logged,
> but then I got timeouts during StateTransfers, so I opened ISPN-661
> (which contains the full stacktraces of this timeout).
>
> After changing Infinispan to timeout after 10 minutes, I'm back to the
> exceptions
> "org.infinispan.remoting.InboundInvocationHandlerImpl] Cache named
> (cachename) does not exist on this cache manager!"
>
> Is there any known workaround to have a second node join the cluster
> while not all caches are initialized at the same time?
>
> BTW from what I understood it seems I definitely need
> strictPeerToPeer="false", shouldn't this be the default?
> I think my use case is quite common, I just start more than one cache
> lazily. (Also I can't pre-start them as the configuration is not known
> until a service requests a cache, the invoker's context affects this
> configuration)
You start the caches lazily, but the cache managers eagerly?
>
> Cheers,
> Sanne
>
>
>>
>> On 14 Sep 2010, at 15:27, Manik Surtani wrote:
>>
>> On 14 Sep 2010, at 05:04, Paul Ferraro wrote:
>>
>> On Mon, 2010-09-13 at 18:12 +0100, Manik Surtani wrote:
>>
>> So in essence a "correct" response would be:
>>
>> 1) If the cache is stopping -> ACK with a ValidResponse
>>
>> Do we have a notion of an ignored (but not invalid) response, i.e. don't
>>
>> trigger a retry/rollback?
>>
>> We can certainly change this for RequestIgnoredResponse by overriding
>> isValid() to return true since it is, as you say, a valid response. Would
>> need to run through the test suite to make sure such a change doesn't break
>> anything though.
>>
>> 2) If the cache is starting, try and wait till we can accept the RPC
>>
>> Yes, except that ComponentStatus.startingUp() currently returns true for
>>
>> every status exception RUNNING. IMO, it would make more sense to
>>
>> restrict this to INSTANTIATED and INITIALIZING.
>>
>> Again, startingUp() would need to be fixed accordingly - and tested.
>>
>>
>> 3) If the cache doesn't exist, ACK with a valid response as well? Surely
>> this will lead to inconsistencies, since the RPC originator will assume the
>> RPC has completed when in fact nothing has happened?
>>
>> From the AS's perspective, an RPC for a non-existent cache (e.g. yet to
>>
>> be deployed app) should be handled no differently than an RPC for a
>>
>> stopping/stopped cache (e.g. undeployed app).
>>
>> I'm not suggesting we should be lie to the RPC originator, but rather
>>
>> that it should be able to distinguish a normal valid response from an
>>
>> ignored (but valid) response.
>>
>> Agreed, but how does this difference manifest itself from a caller's
>> perspective?
>>
>>
>> On 13 Sep 2010, at 15:19, Paul Ferraro wrote:
>>
>> On Mon, 2010-09-13 at 15:05 +0200, Galder Zamarreño wrote:
>>
>> I've had a brief look at this, need to spend a bit more time but here's an
>> initial view on this,
>>
>> At the moment at least, InboundInvocationHandlerImpl doesn't take in
>>
>> account ComponentStatus to see if it's up. It only checks whether the
>>
>> component registry is null, but a ComponentStatus check might make
>>
>> more sense.
>>
>> After the component registry null check, is the following:
>>
>> if (!cr.getStatus().allowInvocations()) {
>>
>> giveupTime = System.currentTimeMillis() +
>> localConfig.getStateRetrievalTimeout();
>>
>> while (cr.getStatus().startingUp() && System.currentTimeMillis() <
>> giveupTime) Thread.sleep(100);
>>
>> if (!cr.getStatus().allowInvocations()) {
>>
>> log.warn("Cache named [{0}] exists but isn't in a state to handle
>> invocations. Its state is {1}.", cacheName, cr.getStatus());
>>
>> return RequestIgnoredResponse.INSTANCE;
>>
>> }
>>
>> }
>>
>> So, there is, in fact, a ComponentStatus check. If the registry is not
>>
>> RUNNING, then we spin for up to 30 seconds for the status to become
>>
>> RUNNING. For a stopping or stopped cache, this does not seem to make
>>
>> sense, since these states do not indicate that the cache is in the
>>
>> process of starting.
>>
>> When I looked at this a while back, I'd have ideally like to be able
>>
>> to start a cache associated with the unknown cache request, however
>>
>> this is not feasible cos you can't know what configuration it should
>>
>> be started with.
>>
>> At first glance, a different valid status would be the way forward,
>>
>> but you have to think about the state transfer and distribution logic
>>
>> and that's the hard bit. If a cache is started in a non-coordinator,
>>
>> and the coordinator has not yet started that cache, how does state
>>
>> transfer or rehash control work? Both of them rely on some kind of
>>
>> logic running on coordinator. Now, who's the coordinator in that case?
>>
>> The coordinator is in theory the first node started, but what if the
>>
>> cache is not yet started in the coordinator? The coordinator now
>>
>> becomes a variant of the Cache rather than the CacheManager.
>>
>> I think the latter is the bigger problem to solve here.
>>
>> Agreed.
>>
>> On Sep 10, 2010, at 7:16 PM, Paul Ferraro wrote:
>>
>> OK - the plot thickens...
>>
>> RequestIgnoredResponse is not actually appropriate because it's an
>>
>> invalid response (i.e. extends InvalidResponse). Oops.
>>
>> So, not only would we either need to return a valid response (perhaps
>>
>> null, like the behavior prior to ISPN-447 ?), but an RPC for a stopped
>>
>> (or stopping) cache should also be considered valid. For example, if I
>>
>> have an app deployed on 2 nodes, and I undeploy the app from node2, this
>>
>> would cause RPC-bound cache operations to fail on node1. Actually,
>>
>> these RPCs would timeout, since the InboundInvocationHandler will wait
>>
>> 30 seconds for them to start. That's no good.
>>
>> To address this would require some changes to the behavior of some of
>>
>> the ComponentStatus values. For example, ComponentStatus.startingUp()
>>
>> returns true for STOPPING and TERMINATED, and consequently
>>
>> InboundInvocationHandler loops for 30 seconds hoping the cache will
>>
>> start. That doesn't seem appropriate for the use case above. Would it
>>
>> be possible to return a valid ignored response (e.g. null) for these
>>
>> states?
>>
>> Thoughts?
>>
>> On Fri, 2010-09-10 at 11:54 -0400, Paul Ferraro wrote:
>>
>> In AS clustering, there are several use cases where a specific cache
>>
>> instance may not exist (or may not be started) for every member of the
>>
>> group. Currently, Infinispan treats this as an exception case, and any
>>
>> cache operation resulting in an RPC will fail. This is problematic for
>>
>> the following AS use cases:
>>
>> 1. For a given clustering service (e.g. web session, SFSBs, entity
>>
>> caching) there is a shared cache manager for all applications, while
>>
>> each application uses its own cache instance. If I have app1 running on
>>
>> node1 and node2, everything is fine. But if I deploy app2 on node1,
>>
>> it's membership will include node2 (because of the shared cache manager)
>>
>> even though there is no cache instance for app2 on node2. Consequently,
>>
>> the cache instances for app2 will be non-functional until app2 is
>>
>> deployed on node2.
>>
>> 2. In Hibernate's 2nd level cache, custom cache regions are created on
>>
>> demand. So, even with a single app running on 2 nodes, the first
>>
>> request to cache an entity in a custom cache region on node1 will fail,
>>
>> since the cache corresponding to the region will not exist on node2.
>>
>> Here's is relevant code in
>>
>> InboundInvocationHandlerImpl.handle(CacheRpcCommand):
>>
>> String cacheName = cmd.getCacheName();
>>
>> ComponentRegistry cr = gcr.getNamedComponentRegistry(cacheName);
>>
>> long giveupTime = System.currentTimeMillis() + 30000; // arbitraty (?) wait
>> time for caches to start
>>
>> while (cr == null && System.currentTimeMillis() < giveupTime) {
>>
>> Thread.sleep(100);
>>
>> cr = gcr.getNamedComponentRegistry(cacheName);
>>
>> }
>>
>> if (cr == null) {
>>
>> if (log.isDebugEnabled()) log.debug("Cache named {0} does not exist on this
>> cache manager!", cacheName);
>>
>> return new ExceptionResponse(new NamedCacheNotFoundException(cacheName));
>>
>> // return RequestIgnoredResponse.INSTANCE; // Suggested fix?
>>
>> }
>>
>> For the perspective of the AS, a request for a non-existent cache should
>>
>> be treated the same way as a request for a stopped cache (that logic
>>
>> returns RequestIgnoredResponse.INSTANCE).
>>
>> As Galder pointed out, handling this case via exception was an explicit
>>
>> workaround for this issue: https://jira.jboss.org/browse/ISPN-447
>>
>> In the comments for ISPN-447, Manik seemed to suggest that returning an
>>
>> exception is merely a workaround until this issue is fixed:
>>
>> https://jira.jboss.org/browse/ISPN-434
>>
>> As it stands, this is a blocker issue for AS infinispan integration.
>>
>> Thoughts?
>>
>> _______________________________________________
>>
>> infinispan-dev mailing list
>>
>> infinispan-dev at lists.jboss.org
>>
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> _______________________________________________
>>
>> infinispan-dev mailing list
>>
>> infinispan-dev at lists.jboss.org
>>
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>>
>> Galder Zamarreño
>>
>> Sr. Software Engineer
>>
>> Infinispan, JBoss Cache
>>
>>
>> _______________________________________________
>>
>> infinispan-dev mailing list
>>
>> infinispan-dev at lists.jboss.org
>>
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> _______________________________________________
>>
>> infinispan-dev mailing list
>>
>> infinispan-dev at lists.jboss.org
>>
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>>
>> Manik Surtani
>>
>> manik at jboss.org
>>
>> Lead, Infinispan
>>
>> Lead, JBoss Cache
>>
>> http://www.infinispan.org
>>
>> http://www.jbosscache.org
>>
>>
>>
>>
>>
>> _______________________________________________
>>
>> infinispan-dev mailing list
>>
>> infinispan-dev at lists.jboss.org
>>
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> _______________________________________________
>>
>> infinispan-dev mailing list
>>
>> infinispan-dev at lists.jboss.org
>>
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>>
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Manik Surtani
>> manik at jboss.org
>> Lead, Infinispan
>> Lead, JBoss Cache
>> http://www.infinispan.org
>> http://www.jbosscache.org
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
More information about the infinispan-dev
mailing list