2010/9/14 Manik Surtani <manik(a)jboss.org>:
> FYI, just checked in
>
http://fisheye.jboss.org/changelog/Infinispan/branches/4.2.x?cs=2361
> and tests run clean.
I just crashed against this, and after finding ISPN-648 I've tried to
set strictPeerToPeer="false", this seemed initially to improve the
situation as no more complaints about inexistent caches where logged,
but then I got timeouts during StateTransfers, so I opened ISPN-661
(which contains the full stacktraces of this timeout).
After changing Infinispan to timeout after 10 minutes, I'm back to the
exceptions
"org.infinispan.remoting.InboundInvocationHandlerImpl] Cache named
(cachename) does not exist on this cache manager!"
Is there any known workaround to have a second node join the cluster
while not all caches are initialized at the same time?
BTW from what I understood it seems I definitely need
strictPeerToPeer="false", shouldn't this be the default?
I think my use case is quite common, I just start more than one cache
lazily. (Also I can't pre-start them as the configuration is not known
until a service requests a cache, the invoker's context affects this
configuration)
Cheers,
Sanne
>
> On 14 Sep 2010, at 15:27, Manik Surtani wrote:
>
> On 14 Sep 2010, at 05:04, Paul Ferraro wrote:
>
> On Mon, 2010-09-13 at 18:12 +0100, Manik Surtani wrote:
>
> So in essence a "correct" response would be:
>
> 1) If the cache is stopping -> ACK with a ValidResponse
>
> Do we have a notion of an ignored (but not invalid) response, i.e. don't
>
> trigger a retry/rollback?
>
> We can certainly change this for RequestIgnoredResponse by overriding
> isValid() to return true since it is, as you say, a valid response. Would
> need to run through the test suite to make sure such a change doesn't break
> anything though.
>
> 2) If the cache is starting, try and wait till we can accept the RPC
>
> Yes, except that ComponentStatus.startingUp() currently returns true for
>
> every status exception RUNNING. IMO, it would make more sense to
>
> restrict this to INSTANTIATED and INITIALIZING.
>
> Again, startingUp() would need to be fixed accordingly - and tested.
>
>
> 3) If the cache doesn't exist, ACK with a valid response as well? Surely
> this will lead to inconsistencies, since the RPC originator will assume the
> RPC has completed when in fact nothing has happened?
>
> From the AS's perspective, an RPC for a non-existent cache (e.g. yet to
>
> be deployed app) should be handled no differently than an RPC for a
>
> stopping/stopped cache (e.g. undeployed app).
>
> I'm not suggesting we should be lie to the RPC originator, but rather
>
> that it should be able to distinguish a normal valid response from an
>
> ignored (but valid) response.
>
> Agreed, but how does this difference manifest itself from a caller's
> perspective?
>
>
> On 13 Sep 2010, at 15:19, Paul Ferraro wrote:
>
> On Mon, 2010-09-13 at 15:05 +0200, Galder Zamarreño wrote:
>
> I've had a brief look at this, need to spend a bit more time but here's an
> initial view on this,
>
> At the moment at least, InboundInvocationHandlerImpl doesn't take in
>
> account ComponentStatus to see if it's up. It only checks whether the
>
> component registry is null, but a ComponentStatus check might make
>
> more sense.
>
> After the component registry null check, is the following:
>
> if (!cr.getStatus().allowInvocations()) {
>
> giveupTime = System.currentTimeMillis() +
> localConfig.getStateRetrievalTimeout();
>
> while (cr.getStatus().startingUp() && System.currentTimeMillis() <
> giveupTime) Thread.sleep(100);
>
> if (!cr.getStatus().allowInvocations()) {
>
> log.warn("Cache named [{0}] exists but isn't in a state to handle
> invocations. Its state is {1}.", cacheName, cr.getStatus());
>
> return RequestIgnoredResponse.INSTANCE;
>
> }
>
> }
>
> So, there is, in fact, a ComponentStatus check. If the registry is not
>
> RUNNING, then we spin for up to 30 seconds for the status to become
>
> RUNNING. For a stopping or stopped cache, this does not seem to make
>
> sense, since these states do not indicate that the cache is in the
>
> process of starting.
>
> When I looked at this a while back, I'd have ideally like to be able
>
> to start a cache associated with the unknown cache request, however
>
> this is not feasible cos you can't know what configuration it should
>
> be started with.
>
> At first glance, a different valid status would be the way forward,
>
> but you have to think about the state transfer and distribution logic
>
> and that's the hard bit. If a cache is started in a non-coordinator,
>
> and the coordinator has not yet started that cache, how does state
>
> transfer or rehash control work? Both of them rely on some kind of
>
> logic running on coordinator. Now, who's the coordinator in that case?
>
> The coordinator is in theory the first node started, but what if the
>
> cache is not yet started in the coordinator? The coordinator now
>
> becomes a variant of the Cache rather than the CacheManager.
>
> I think the latter is the bigger problem to solve here.
>
> Agreed.
>
> On Sep 10, 2010, at 7:16 PM, Paul Ferraro wrote:
>
> OK - the plot thickens...
>
> RequestIgnoredResponse is not actually appropriate because it's an
>
> invalid response (i.e. extends InvalidResponse). Oops.
>
> So, not only would we either need to return a valid response (perhaps
>
> null, like the behavior prior to ISPN-447 ?), but an RPC for a stopped
>
> (or stopping) cache should also be considered valid. For example, if I
>
> have an app deployed on 2 nodes, and I undeploy the app from node2, this
>
> would cause RPC-bound cache operations to fail on node1. Actually,
>
> these RPCs would timeout, since the InboundInvocationHandler will wait
>
> 30 seconds for them to start. That's no good.
>
> To address this would require some changes to the behavior of some of
>
> the ComponentStatus values. For example, ComponentStatus.startingUp()
>
> returns true for STOPPING and TERMINATED, and consequently
>
> InboundInvocationHandler loops for 30 seconds hoping the cache will
>
> start. That doesn't seem appropriate for the use case above. Would it
>
> be possible to return a valid ignored response (e.g. null) for these
>
> states?
>
> Thoughts?
>
> On Fri, 2010-09-10 at 11:54 -0400, Paul Ferraro wrote:
>
> In AS clustering, there are several use cases where a specific cache
>
> instance may not exist (or may not be started) for every member of the
>
> group. Currently, Infinispan treats this as an exception case, and any
>
> cache operation resulting in an RPC will fail. This is problematic for
>
> the following AS use cases:
>
> 1. For a given clustering service (e.g. web session, SFSBs, entity
>
> caching) there is a shared cache manager for all applications, while
>
> each application uses its own cache instance. If I have app1 running on
>
> node1 and node2, everything is fine. But if I deploy app2 on node1,
>
> it's membership will include node2 (because of the shared cache manager)
>
> even though there is no cache instance for app2 on node2. Consequently,
>
> the cache instances for app2 will be non-functional until app2 is
>
> deployed on node2.
>
> 2. In Hibernate's 2nd level cache, custom cache regions are created on
>
> demand. So, even with a single app running on 2 nodes, the first
>
> request to cache an entity in a custom cache region on node1 will fail,
>
> since the cache corresponding to the region will not exist on node2.
>
> Here's is relevant code in
>
> InboundInvocationHandlerImpl.handle(CacheRpcCommand):
>
> String cacheName = cmd.getCacheName();
>
> ComponentRegistry cr = gcr.getNamedComponentRegistry(cacheName);
>
> long giveupTime = System.currentTimeMillis() + 30000; // arbitraty (?) wait
> time for caches to start
>
> while (cr == null && System.currentTimeMillis() < giveupTime) {
>
> Thread.sleep(100);
>
> cr = gcr.getNamedComponentRegistry(cacheName);
>
> }
>
> if (cr == null) {
>
> if (log.isDebugEnabled()) log.debug("Cache named {0} does not exist on this
> cache manager!", cacheName);
>
> return new ExceptionResponse(new NamedCacheNotFoundException(cacheName));
>
> // return RequestIgnoredResponse.INSTANCE; // Suggested fix?
>
> }
>
> For the perspective of the AS, a request for a non-existent cache should
>
> be treated the same way as a request for a stopped cache (that logic
>
> returns RequestIgnoredResponse.INSTANCE).
>
> As Galder pointed out, handling this case via exception was an explicit
>
> workaround for this issue:
https://jira.jboss.org/browse/ISPN-447
>
> In the comments for ISPN-447, Manik seemed to suggest that returning an
>
> exception is merely a workaround until this issue is fixed:
>
>
https://jira.jboss.org/browse/ISPN-434
>
> As it stands, this is a blocker issue for AS infinispan integration.
>
> Thoughts?
>
> _______________________________________________
>
> infinispan-dev mailing list
>
> infinispan-dev(a)lists.jboss.org
>
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
>
> infinispan-dev mailing list
>
> infinispan-dev(a)lists.jboss.org
>
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
>
> Galder Zamarreño
>
> Sr. Software Engineer
>
> Infinispan, JBoss Cache
>
>
> _______________________________________________
>
> infinispan-dev mailing list
>
> infinispan-dev(a)lists.jboss.org
>
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
>
> infinispan-dev mailing list
>
> infinispan-dev(a)lists.jboss.org
>
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
>
> Manik Surtani
>
> manik(a)jboss.org
>
> Lead, Infinispan
>
> Lead, JBoss Cache
>
>
http://www.infinispan.org
>
>
http://www.jbosscache.org
>
>
>
>
>
> _______________________________________________
>
> infinispan-dev mailing list
>
> infinispan-dev(a)lists.jboss.org
>
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
>
> infinispan-dev mailing list
>
> infinispan-dev(a)lists.jboss.org
>
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> manik(a)jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
>
http://www.infinispan.org
>
http://www.jbosscache.org
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> manik(a)jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
>
http://www.infinispan.org
>
http://www.jbosscache.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev