Re: [infinispan-dev] ISPN-425 - Issues with waiting for rehash to complete on startup

Monday, 10 May 2010

Actually, doing this seems more complex than I originally thought. Specially when you take
in account that you could have multiple caches checking the coordinator and expecting it
to return the CH address list. For example, if you have nodes A, B and C and assuming B
and C had requests on 'hotRodDistSync', B could assume it's only himself, but
when C comes in, how's A gonna tell that B is part of the CH address list?

Another possibility I'm toying with is for the coordinator to start the given cache if
it does not exist in cache manager.

----- galder(a)redhat.com wrote:

...
 Hi,

 Re: https://jira.jboss.org/jira/browse/ISPN-425

 We've been discussing solutions to the fundamental problem in this
 issue which is the fact that operations are allowed in the cache
 before rehashing has finished starting up. I've been playing around
 with a solution based around waiting for rehashing to complete but
 this is causing issues with Hot Rod distribution tests. In Hot Rod,
 this is what happens:

 1. Start Hot Rod server 1 which starts a replicated topology cache.
 2. Start Hot Rod server 2 which starts a replicated topology cache.
 3. Send a request for a distributed cache called 'hotRodDistSync' in
 Hot Rod server 2.
 4. As a result of this request, 'hotRodDistSync' cache should be
 started up but it does not succeed. It stays in a Rehash join loop,
 saying:

 4595  INFO  [org.infinispan.remoting.InboundInvocationHandlerImpl]
 (OOB-2,Infinispan-Cluster,eq-52045:) Cache named hotRodDistSync does
 not exist on this cache manager!
 4595  TRACE [org.infinispan.marshall.VersionAwareMarshaller]
 (OOB-2,Infinispan-Cluster,eq-52045:) Wrote version 410
 4596  TRACE [org.infinispan.marshall.VersionAwareMarshaller]
 (OOB-2,Infinispan-Cluster,eq-64501:) Read version 410
 4596  TRACE
 [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
 (Rehasher-eq-64501:) responses: [sender=eq-52045, retval=null,
 received=true, suspected=false]

 4597  DEBUG [org.infinispan.distribution.JoinTask]
 (Rehasher-eq-64501:) Retrieved old consistent hash address list null
 4597  TRACE [org.infinispan.distribution.JoinTask]
 (Rehasher-eq-64501:) Sleeping for 1.54 seconds

 The problem here is that Hot Rod server 1 has not yet started
 'hotRodDistSync' cache since no requests where sent to it. Now, this
 is different to the cache not allowing invocations yet cos it's in
 middle of the startup. So, I wondered if
 InboundInvocationHandlerImpl.handle() could return a custom response
 rather than null and for JoinTask to handle it in such a way that if
 all the responses received say that the cache does not exist, then
 consider rehash completed and finish the process.

 Now, the reason I'm saying to return a custom response is because I
 can see that JOIN_REQ returning null can also mean that the
 coordinator is in the middle of another join
 (DMI.requestPermissionToJoin). These two situations are not the same,
 hence why I suggest a different treatment.

 Cheers,
 --
 Galder Zamarreño
 Sr. Software Engineer
 Infinispan, JBoss Cache

 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] ISPN-425 - Issues with waiting for rehash to complete on startup