[infinispan-dev] ISPN-425 - Issues with waiting for rehash to complete on startup

Manik Surtani manik at jboss.org
Mon May 10 11:06:28 EDT 2010


Well, firstly is it even feasible for HotRod clients to be starting cache instances lazily?  Should we (for now, anyway) mandate that the cache names used in HotRod pertain to caches that have been configured and are running on the backend?

Also, when you start the server using startServer.sh, which caches do you start?  Just the default?  All configured in infinispan.xml?

Cheers
Manik


On 10 May 2010, at 16:02, Galder Zamarreno wrote:

> I've tried starting the cache but does not seem to work as expected since the JOIN_REQ is set to itself (the coordinator) and the target is removed at transport. You could maybe hack this by calling directly DMI if you yourself are the target however that seems to cause more havoc. Looking for more alternative solutions.
> 
> ----- "Galder Zamarreno" <galder at jboss.org> wrote:
> 
>> Actually, doing this seems more complex than I originally thought.
>> Specially when you take in account that you could have multiple caches
>> checking the coordinator and expecting it to return the CH address
>> list. For example, if you have nodes A, B and C and assuming B and C
>> had requests on 'hotRodDistSync', B could assume it's only himself,
>> but when C comes in, how's A gonna tell that B is part of the CH
>> address list?
>> 
>> Another possibility I'm toying with is for the coordinator to start
>> the given cache if it does not exist in cache manager.
>> 
>> ----- galder at redhat.com wrote:
>> 
>>> Hi,
>>> 
>>> Re: https://jira.jboss.org/jira/browse/ISPN-425
>>> 
>>> We've been discussing solutions to the fundamental problem in this
>>> issue which is the fact that operations are allowed in the cache
>>> before rehashing has finished starting up. I've been playing around
>>> with a solution based around waiting for rehashing to complete but
>>> this is causing issues with Hot Rod distribution tests. In Hot Rod,
>>> this is what happens:
>>> 
>>> 1. Start Hot Rod server 1 which starts a replicated topology cache.
>>> 2. Start Hot Rod server 2 which starts a replicated topology cache.
>>> 3. Send a request for a distributed cache called 'hotRodDistSync' in
>>> Hot Rod server 2.
>>> 4. As a result of this request, 'hotRodDistSync' cache should be
>>> started up but it does not succeed. It stays in a Rehash join loop,
>>> saying:
>>> 
>>> 4595  INFO  [org.infinispan.remoting.InboundInvocationHandlerImpl]
>>> (OOB-2,Infinispan-Cluster,eq-52045:) Cache named hotRodDistSync does
>>> not exist on this cache manager!
>>> 4595  TRACE [org.infinispan.marshall.VersionAwareMarshaller]
>>> (OOB-2,Infinispan-Cluster,eq-52045:) Wrote version 410
>>> 4596  TRACE [org.infinispan.marshall.VersionAwareMarshaller]
>>> (OOB-2,Infinispan-Cluster,eq-64501:) Read version 410
>>> 4596  TRACE
>>> 
>> [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
>>> (Rehasher-eq-64501:) responses: [sender=eq-52045, retval=null,
>>> received=true, suspected=false]
>>> 
>>> 4597  DEBUG [org.infinispan.distribution.JoinTask]
>>> (Rehasher-eq-64501:) Retrieved old consistent hash address list null
>>> 4597  TRACE [org.infinispan.distribution.JoinTask]
>>> (Rehasher-eq-64501:) Sleeping for 1.54 seconds
>>> 
>>> The problem here is that Hot Rod server 1 has not yet started
>>> 'hotRodDistSync' cache since no requests where sent to it. Now, this
>>> is different to the cache not allowing invocations yet cos it's in
>>> middle of the startup. So, I wondered if
>>> InboundInvocationHandlerImpl.handle() could return a custom response
>>> rather than null and for JoinTask to handle it in such a way that if
>>> all the responses received say that the cache does not exist, then
>>> consider rehash completed and finish the process.
>>> 
>>> Now, the reason I'm saying to return a custom response is because I
>>> can see that JOIN_REQ returning null can also mean that the
>>> coordinator is in the middle of another join
>>> (DMI.requestPermissionToJoin). These two situations are not the
>> same,
>>> hence why I suggest a different treatment.
>>> 
>>> Cheers,
>>> --
>>> Galder Zamarreño
>>> Sr. Software Engineer
>>> Infinispan, JBoss Cache
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org







More information about the infinispan-dev mailing list